HUMB: Automatic Key Term Extraction from Scientific Articles in GROBID

Abstract : The Semeval task 5 was an opportunity for experimenting with the key term ex- traction module of GROBID, a system for extracting and generating bibliographical information from technical and scientific documents. The tool first uses GROBID's facilities for analyzing the structure of sci- entific articles, resulting in a first set of structural features. A second set of fea- tures captures content properties based on phraseness, informativeness and keyword- ness measures. Two knowledge bases, GRISP and Wikipedia, are then exploited for producing a last set of lexical/semantic features. Bagged decision trees appeared to be the most efficient machine learning algorithm for generating a list of ranked key term candidates. Finally a post rank- ing was realized based on statistics of co- usage of keywords in HAL, a large Open Access publication repository.
Document type :
Conference papers
Complete list of metadatas

https://hal.inria.fr/inria-00493437
Contributor : Laurent Romary <>
Submitted on : Friday, June 18, 2010 - 5:17:19 PM
Last modification on : Friday, March 22, 2019 - 2:22:12 PM
Long-term archiving on : Monday, October 22, 2012 - 12:00:39 PM

Files

Identifiers

  • HAL Id : inria-00493437, version 1

Collections

Citation

Patrice Lopez, Laurent Romary. HUMB: Automatic Key Term Extraction from Scientific Articles in GROBID. SemEval 2010 Workshop, ACL SigLex event, Jul 2010, Uppsala, Sweden. 4 p. ⟨inria-00493437⟩

Share

Metrics

Record views

850

Files downloads

2652