Dynamic Bayesian networks for symbolic polyphonic pitch modeling

Stanislaw Raczynski 1 Emmanuel Vincent 1, 2 Shigeki Sagayama 3
1 METISS - Speech and sound data modeling and processing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
2 PAROLE - Analysis, perception and recognition of speech
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Symbolic pitch modelling is a way of incorporating knowledge about relations between pitches into the process of analysing musical information or signals. In this paper, we propose a family of probabilistic symbolic polyphonic pitch models, which account for both the ''horizontal'' and the ''vertical'' pitch structure. These models are formulated as linear or log-linear interpolations of up to five sub-models, each of which is responsible for modelling a different type of relation. The ability of the models to predict symbolic pitch data is evaluated in terms of their cross-entropy, and of a newly proposed ''contextual cross-entropy'' measure. Their performance is then measured on synthesised polyphonic audio signals in terms of the accuracy of multiple pitch estimation in combination with a Nonnegative Matrix Factorisation-based acoustic model. In both experiments, the log-linear combination of at least one ''vertical'' (e.g., harmony) and one ''horizontal'' (e.g., note duration) sub-model outperformed a pitch-dependent Bernoulli prior by more than 60% in relative cross-entropy and 3% in absolute multiple pitch estimation accuracy. This work provides a proof of concept of the usefulness of model interpolation, which may be used for improved symbolic modelling of other aspects of music in the future.
Type de document :
Article dans une revue
IEEE Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2013, 21 (9), pp.1830-1840
Liste complète des métadonnées

Littérature citée [40 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00803886
Contributeur : Emmanuel Vincent <>
Soumis le : samedi 23 mars 2013 - 17:09:29
Dernière modification le : mercredi 16 mai 2018 - 11:23:03
Document(s) archivé(s) le : lundi 24 juin 2013 - 05:05:12

Fichier

raczynski_TASLP13.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00803886, version 1

Citation

Stanislaw Raczynski, Emmanuel Vincent, Shigeki Sagayama. Dynamic Bayesian networks for symbolic polyphonic pitch modeling. IEEE Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2013, 21 (9), pp.1830-1840. 〈hal-00803886〉

Partager

Métriques

Consultations de la notice

584

Téléchargements de fichiers

435