Dynamic Topic Identification: Towards Combination of Methods

Brigitte Bigi 1 Armelle Brun 1 Jean-Paul Haton 1 Kamel Smaïli 1 Imed Zitouni 1
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : This paper presents several statistical methods for topic identification (TID): topic unigrams, cache model, TFIDF classifier, topic perplexity, and weighted model. Our work aims to improve these methods by confronting them to very different data, measuring their potential complementarity and their TID performance with simple combinations. Statistical topic identification methods depend not only on a corpus, but also on its type. This study allows to advance the cache model which achieves a TID performance of 82 %. This performance has been increased to 82.3 % with our best linear combination.
Type de document :
Communication dans un congrès
Recent Advances in Natural Language Processing - RANLP'2001, 2001, Tzigov Chark, Bulgaria, pp.255-257, 2001
Liste complète des métadonnées

https://hal.inria.fr/inria-00100481
Contributeur : Publications Loria <>
Soumis le : mardi 26 septembre 2006 - 14:46:08
Dernière modification le : jeudi 11 janvier 2018 - 06:19:55

Identifiants

  • HAL Id : inria-00100481, version 1

Collections

Citation

Brigitte Bigi, Armelle Brun, Jean-Paul Haton, Kamel Smaïli, Imed Zitouni. Dynamic Topic Identification: Towards Combination of Methods. Recent Advances in Natural Language Processing - RANLP'2001, 2001, Tzigov Chark, Bulgaria, pp.255-257, 2001. 〈inria-00100481〉

Partager

Métriques

Consultations de la notice

184