HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Automatic Extraction of Document Topics

Abstract : A keyword or topic for a document is a word or multi-word (sequence of 2 or more words) that summarizes in itself part of that document content. In this paper we compare several statistics-based language independent methodologies to automatically extract keywords. We rank words, multi-words, and word prefixes (with fixed length: 5 characters), by using several similarity measures (some widely known and some newly coined) and evaluate the results obtained as well as the agreement between evaluators. Portuguese, English and Czech were the languages experimented.
Document type :
Conference papers
Complete list of metadata

Cited literature [13 references]  Display  Hide  Download

Contributor : Hal Ifip Connect in order to contact the contributor
Submitted on : Friday, July 21, 2017 - 11:25:18 AM
Last modification on : Wednesday, November 10, 2021 - 5:18:05 PM


Files produced by the author(s)


Distributed under a Creative Commons Attribution 4.0 International License



Luís Teixeira, Gabriel Lopes, Rita Ribeiro. Automatic Extraction of Document Topics. 2nd Doctoral Conference on Computing, Electrical and Industrial Systems (DoCEIS), Feb 2011, Costa de Caparica, Portugal. pp.101-108, ⟨10.1007/978-3-642-19170-1_11⟩. ⟨hal-01566554⟩



Record views


Files downloads