Abstract : Keyphrase extraction is a fundamental task in information management, which is often used as a preliminary step in various information retrieval and natural language processing tasks. The main contribution of this paper lies in providing a comparative assessment of prominent multilingual unsupervised keyphrase extraction methods that build on statistical (RAKE, YAKE), graph-based (TextRank, SingleRank) and deep learning (KeyBERT) methods. For the experimentations reported in this paper, we employ well-known datasets designed for keyphrase extraction from five different natural languages (English, French, Spanish, Portuguese and Polish). We use the F1 score and a partial match evaluation framework, aiming to investigate whether the number of terms of the documents and the language of each dataset affect the accuracy of the selected methods. Our experimental results reveal a set of insights about the suitability of the selected methods in texts of different sizes, as well as the performance of these methods in datasets of different languages.
https://hal.inria.fr/hal-03287681 Contributor : Hal IfipConnect in order to contact the contributor Submitted on : Thursday, July 15, 2021 - 6:10:48 PM Last modification on : Friday, August 13, 2021 - 4:29:53 PM Long-term archiving on: : Saturday, October 16, 2021 - 7:06:20 PM
File
Restricted access
To satisfy the distribution rights of the publisher, the document is embargoed
until : 2024-01-01