Investigating the usefulness of i-vectors for automatic language characterization - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2022

Investigating the usefulness of i-vectors for automatic language characterization

Résumé

Work done in recent years has shown the usefulness of using automatic methods for the study of linguistic typology. However, the majority of proposed approaches come from natural language processing and require expert knowledge to predict typological information for new languages. An alternative would be to use speech-based methods that do not need extensive linguistic annotations, but considerably less work has been done in this direction. The current study aims to reduce this gap, by investigating a promising speech representation, i-vectors, which by capturing suprasegmental features of language, can be used for the automatic characterization of languages. Employing data from 24 languages, covering several linguistic families, we computed the i-vectors corresponding to each sentence and we represented the languages by their centroid i-vector. Analyzing the distance between the language centroids and phonological, inventory and syntactic distances between the same languages, we observed a significant correlation between the i-vector distance and the syntactic distance. Then, we explored in more detailed a number of syntactic features and we proposed a method for predicting the value of the most promising feature, based on the i-vector information. The obtained results, an 87% classification accuracy, are encouraging and we envision to extend this method further.
Fichier principal
Vignette du fichier
seyssel22_speechprosody.pdf (917.76 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03823002 , version 1 (20-10-2022)

Identifiants

Citer

Maureen De Seyssel, Guillaume Wisniewski, Emmanuel Dupoux, Bogdan Ludusan. Investigating the usefulness of i-vectors for automatic language characterization. Speech Prosody 2022 - 11th International Conference on Speech Prosody, May 2022, Lisbonne, Portugal. ⟨10.21437/speechprosody.2022-94⟩. ⟨hal-03823002⟩
58 Consultations
52 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More