Non-linear speech representation based on local predictability exponents

Vahid Khanagha; Khalid Daoudi; Oriol Pont; Hussein Yahia; Antonio Turiel

Article Dans Une Revue Neurocomputing Année : 2013

Non-linear speech representation based on local predictability exponents

(1) , (1) , (1) , (1) , (2)

1
2

Vahid Khanagha

Fonction : Auteur
PersonId : 865238

Geometry and Statistics in acquisition data

Khalid Daoudi

Fonction : Auteur
PersonId : 1329075
ORCID : 0000-0003-3536-1060
IdRef : 115483500

Geometry and Statistics in acquisition data

Oriol Pont

Fonction : Auteur
PersonId : 1986
IdHAL : oriolpont
IdRef : 253134374

Geometry and Statistics in acquisition data

Hussein Yahia

Fonction : Auteur
PersonId : 16847
IdHAL : hussein-yahia
ORCID : 0000-0002-4284-096X
IdRef : 031827543

Geometry and Statistics in acquisition data

Antonio Turiel

Fonction : Auteur
PersonId : 915120

Institute of Marine Sciences / Institut de Ciències del Mar [Barcelona]

Résumé

Looking for new perspectives to analyze non-linear dynamics of speech, this paper presents a novel approach based on a microcanonical multiscale formulation which allows the geometric and statistical description of multiscale properties of the complex dynamics. Speech is a complex system whose dynamics can be, to some extent, geometrically and statistically accessed by the computation of Local Predictability Exponents (LPEs) unlocking the determination of the most informative subset (Most Singular Manifold or MSM), leading to associated compact representation and reconstruction. But the complex intertwining of different dynamics in speech (added to purely turbulent descriptions) suggests the definition of appropriate multiscale functionals that might influence the evaluation of LPEs, hence leading more compact MSM. Consequently, by using the classical and generic Sauer/Allebach algorithm for signal reconstruction from irregularly spaced samples, we show that speech reconstruction of high quality can be achieved using MSM of low cardinality. Moreover, in order to further show the potential of the new methodology, we develop a simple and efficient waveform coder which achieves almost the same level of perceptual quality as a standard coder, while having a lower bit-rate.

Domaines

Traitement du signal et de l'image [eess.SP] Traitement du signal et de l'image [eess.SP]

Fichier principal

Neuro2012.pdf (170.62 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Vahid Khanagha : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00685019

Soumis le : mardi 16 avril 2013-21:31:26

Dernière modification le : vendredi 12 avril 2024-18:32:05

Archivage à long terme le : mercredi 17 juillet 2013-02:25:09

Dates et versions

hal-00685019 , version 1 (16-04-2013)

Identifiants

HAL Id : hal-00685019 , version 1

Citer

Vahid Khanagha, Khalid Daoudi, Oriol Pont, Hussein Yahia, Antonio Turiel. Non-linear speech representation based on local predictability exponents. Neurocomputing, 2013, Special issue on Non-Linear Speech Signal processing. ⟨hal-00685019⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INRIA INRIA2

202 Consultations

273 Téléchargements

Non-linear speech representation based on local predictability exponents

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager