Automatic Extraction of Document Topics

Abstract : A keyword or topic for a document is a word or multi-word (sequence of 2 or more words) that summarizes in itself part of that document content. In this paper we compare several statistics-based language independent methodologies to automatically extract keywords. We rank words, multi-words, and word prefixes (with fixed length: 5 characters), by using several similarity measures (some widely known and some newly coined) and evaluate the results obtained as well as the agreement between evaluators. Portuguese, English and Czech were the languages experimented.
Type de document :
Communication dans un congrès
Luis M. Camarinha-Matos. 2nd Doctoral Conference on Computing, Electrical and Industrial Systems (DoCEIS), Feb 2011, Costa de Caparica, Portugal. Springer, IFIP Advances in Information and Communication Technology, AICT-349, pp.101-108, 2011, Technological Innovation for Sustainability. 〈10.1007/978-3-642-19170-1_11〉
Liste complète des métadonnées

Littérature citée [14 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01566554
Contributeur : Hal Ifip <>
Soumis le : vendredi 21 juillet 2017 - 11:25:18
Dernière modification le : vendredi 21 juillet 2017 - 11:34:17

Fichier

978-3-642-19170-1_11_Chapter.p...
Fichiers produits par l'(les) auteur(s)

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

Citation

Luís Teixeira, Gabriel Lopes, Rita Ribeiro. Automatic Extraction of Document Topics. Luis M. Camarinha-Matos. 2nd Doctoral Conference on Computing, Electrical and Industrial Systems (DoCEIS), Feb 2011, Costa de Caparica, Portugal. Springer, IFIP Advances in Information and Communication Technology, AICT-349, pp.101-108, 2011, Technological Innovation for Sustainability. 〈10.1007/978-3-642-19170-1_11〉. 〈hal-01566554〉

Partager

Métriques

Consultations de la notice

34

Téléchargements de fichiers

11