Experiment Analysis in Newspaper Topic Detection

Armelle Brun 1 Kamel Smaïli 1 Jean-Paul Haton 1
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : This paper presents several methods for topic detection on newspaper articles, using either a general vocabulary or topic-specific vocabularies. Specific vocabularies are determined manually or statistically. In both cases, we aim at finding the most representative words of a topic. Several methods have been experimented, the first one is based on perplexity, this method achieves a 100% topic identification rate, on large test corpora, when the two first propositions are taken into account. Other methods are based on statistical counts and achieve 94% of identification on smaller test corpora. The most challenge of this work is to identify topics with only few words in order to be able, during speech recognition, to determine the best adequate language model.
Document type :
Conference papers
Complete list of metadatas

Cited literature [11 references]  Display  Hide  Download

https://hal.inria.fr/inria-00099394
Contributor : Publications Loria <>
Submitted on : Tuesday, November 21, 2017 - 10:44:44 AM
Last modification on : Thursday, January 11, 2018 - 6:19:57 AM

File

SPIRE00.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00099394, version 1

Collections

Citation

Armelle Brun, Kamel Smaïli, Jean-Paul Haton. Experiment Analysis in Newspaper Topic Detection. SPIRE 2000 - String Processing & Information Retrieval, 2000, A Coruna, Spain. pp.55 - 64. ⟨inria-00099394⟩

Share

Metrics

Record views

243

Files downloads

127