HUMB: Automatic Key Term Extraction from Scientific Articles in GROBID

Abstract : The Semeval task 5 was an opportunity for experimenting with the key term ex- traction module of GROBID, a system for extracting and generating bibliographical information from technical and scientific documents. The tool first uses GROBID's facilities for analyzing the structure of sci- entific articles, resulting in a first set of structural features. A second set of fea- tures captures content properties based on phraseness, informativeness and keyword- ness measures. Two knowledge bases, GRISP and Wikipedia, are then exploited for producing a last set of lexical/semantic features. Bagged decision trees appeared to be the most efficient machine learning algorithm for generating a list of ranked key term candidates. Finally a post rank- ing was realized based on statistics of co- usage of keywords in HAL, a large Open Access publication repository.
Type de document :
Communication dans un congrès
SemEval 2010 Workshop, Jul 2010, Uppsala, Sweden. 4 p., 2010
Liste complète des métadonnées

https://hal.inria.fr/inria-00493437
Contributeur : Laurent Romary <>
Soumis le : vendredi 18 juin 2010 - 17:17:19
Dernière modification le : vendredi 3 novembre 2017 - 08:24:01
Document(s) archivé(s) le : lundi 22 octobre 2012 - 12:00:39

Identifiants

  • HAL Id : inria-00493437, version 1

Collections

Citation

Patrice Lopez, Laurent Romary. HUMB: Automatic Key Term Extraction from Scientific Articles in GROBID. SemEval 2010 Workshop, Jul 2010, Uppsala, Sweden. 4 p., 2010. 〈inria-00493437〉

Partager

Métriques

Consultations de la notice

619

Téléchargements de fichiers

769