Dynamic Bayesian networks for symbolic polyphonic pitch modeling

Stanislaw Raczynski 1 Emmanuel Vincent 1, 2 Shigeki Sagayama 3
1 METISS - Speech and sound data modeling and processing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
2 PAROLE - Analysis, perception and recognition of speech
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Symbolic pitch modelling is a way of incorporating knowledge about relations between pitches into the process of analysing musical information or signals. In this paper, we propose a family of probabilistic symbolic polyphonic pitch models, which account for both the ''horizontal'' and the ''vertical'' pitch structure. These models are formulated as linear or log-linear interpolations of up to five sub-models, each of which is responsible for modelling a different type of relation. The ability of the models to predict symbolic pitch data is evaluated in terms of their cross-entropy, and of a newly proposed ''contextual cross-entropy'' measure. Their performance is then measured on synthesised polyphonic audio signals in terms of the accuracy of multiple pitch estimation in combination with a Nonnegative Matrix Factorisation-based acoustic model. In both experiments, the log-linear combination of at least one ''vertical'' (e.g., harmony) and one ''horizontal'' (e.g., note duration) sub-model outperformed a pitch-dependent Bernoulli prior by more than 60% in relative cross-entropy and 3% in absolute multiple pitch estimation accuracy. This work provides a proof of concept of the usefulness of model interpolation, which may be used for improved symbolic modelling of other aspects of music in the future.
Liste complète des métadonnées

Cited literature [40 references]  Display  Hide  Download

Contributor : Emmanuel Vincent <>
Submitted on : Saturday, March 23, 2013 - 5:09:29 PM
Last modification on : Thursday, March 21, 2019 - 2:20:12 PM
Document(s) archivé(s) le : Monday, June 24, 2013 - 5:05:12 AM


Files produced by the author(s)


  • HAL Id : hal-00803886, version 1


Stanislaw Raczynski, Emmanuel Vincent, Shigeki Sagayama. Dynamic Bayesian networks for symbolic polyphonic pitch modeling. IEEE Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2013, 21 (9), pp.1830-1840. ⟨hal-00803886⟩



Record views


Files downloads