HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Journal articles

Automatic discovery of topics and acoustic morphemes from speech

Christophe Cerisara 1
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : This work deals with automatic lexical acquisition and topic discovery from a speech stream. The proposed algorithm builds a lexicon enriched with topic information in three steps: transcription of an audio stream into phone sequences with a speaker- and task-independent phone recogniser, automatic lexical acquisition based on approximate string matching, and hierarchical topic clustering of the lexical entries based on a knowledge-poor co-occurrence approach. The resulting semantic lexicon is then used to automatically cluster the incoming speech stream into topics. The main advantages of this algorithm are its very low computational requirements and its independence to pre-defined linguistic resources, which makes it easy to port to new languages and to adapt to new tasks. It is evaluated both qualitatively and quantitatively on two corpora and on two tasks related to topic clustering. The results of these evaluations are encouraging and outline future directions of research for the proposed algorithm, such as building automatic orthographic labels of the lexical items.
Document type :
Journal articles
Complete list of metadata

https://hal.inria.fr/inria-00330698
Contributor : Christophe Cerisara Connect in order to contact the contributor
Submitted on : Wednesday, October 15, 2008 - 11:40:46 AM
Last modification on : Thursday, January 20, 2022 - 5:28:47 PM

Identifiers

Collections

Citation

Christophe Cerisara. Automatic discovery of topics and acoustic morphemes from speech. Computer Speech and Language, Elsevier, 2009, 23 (2), pp.220-239. ⟨10.1016/j.csl.2008.06.004⟩. ⟨inria-00330698⟩

Share

Metrics

Record views

104