Dynamic Bayesian networks for symbolic polyphonic pitch modeling - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Article Dans Une Revue IEEE Transactions on Audio, Speech and Language Processing Année : 2013

Dynamic Bayesian networks for symbolic polyphonic pitch modeling

Résumé

Symbolic pitch modelling is a way of incorporating knowledge about relations between pitches into the process of analysing musical information or signals. In this paper, we propose a family of probabilistic symbolic polyphonic pitch models, which account for both the ''horizontal'' and the ''vertical'' pitch structure. These models are formulated as linear or log-linear interpolations of up to five sub-models, each of which is responsible for modelling a different type of relation. The ability of the models to predict symbolic pitch data is evaluated in terms of their cross-entropy, and of a newly proposed ''contextual cross-entropy'' measure. Their performance is then measured on synthesised polyphonic audio signals in terms of the accuracy of multiple pitch estimation in combination with a Nonnegative Matrix Factorisation-based acoustic model. In both experiments, the log-linear combination of at least one ''vertical'' (e.g., harmony) and one ''horizontal'' (e.g., note duration) sub-model outperformed a pitch-dependent Bernoulli prior by more than 60% in relative cross-entropy and 3% in absolute multiple pitch estimation accuracy. This work provides a proof of concept of the usefulness of model interpolation, which may be used for improved symbolic modelling of other aspects of music in the future.
Fichier principal
Vignette du fichier
raczynski_TASLP13.pdf (278.73 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00803886 , version 1 (23-03-2013)

Identifiants

Citer

Stanislaw Raczynski, Emmanuel Vincent, Shigeki Sagayama. Dynamic Bayesian networks for symbolic polyphonic pitch modeling. IEEE Transactions on Audio, Speech and Language Processing, 2013, 21 (9), pp.1830-1840. ⟨10.1109/TASL.2013.2258012⟩. ⟨hal-00803886⟩
364 Consultations
394 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More