Dynamic Bayesian networks for symbolic polyphonic pitch modeling

Stanislaw Raczynski 1 Emmanuel Vincent 1 Shigeki Sagayama 2
1 METISS - Speech and sound data modeling and processing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : Symbolic pitch modeling is a way of incorporating knowledge about relations between pitches into the process of analyzing musical information or signals, and it is typically done in a statistical framework. It has proven to be an efficient way of improving the performance of various Music Information Retrieval (MIR) algorithms. In this paper, we propose a family of probabilistic symbolic polyphonic pitch models for multiple pitch estimation, which account for both the ''horizontal'' and the ''vertical'' pitch structure. These models are formulated as linear or log-linear interpolations of up to five submodels, each of which is responsible for modeling a different aspect of music. The ability of the models to predict symbolic pitch data is evaluated in terms of their cross-entropy, and of a newly proposed ''contextual cross-entropy'' measure. Their performance is then measured on synthesized polyphonic audio signals in terms of the accuracy of multiple pitch estimation in combination with a Nonnegative Matrix Factorization-based acoustic model. In both experiments, the log-linear combinations of at least one ''horizontal'' (e.g.\ harmony) and one ''vertical'' (e.g.\ note duration) models outperformed the baseline methods, by almost 60\% in cross-entropy reduction and almost 4\% in multiple pitch estimation accuracy. This work provides a proof of concept of the usefulness of model interpolation in the area of pitch modeling, which may be used for improved symbolic modeling in the future.
Type de document :
[Technical Report] RT-0430, 2012

Contributeur : Stanislaw Raczynski <>
Soumis le : lundi 25 mars 2013 - 17:37:02
Dernière modification le : vendredi 13 janvier 2017 - 14:18:11


Fichiers produits par l'(les) auteur(s)


  • HAL Id : hal-00728771, version 3



Stanislaw Raczynski, Emmanuel Vincent, Shigeki Sagayama. Dynamic Bayesian networks for symbolic polyphonic pitch modeling. [Technical Report] RT-0430, 2012. <hal-00728771v3>



Consultations de
la notice


Téléchargements du document