Skip to Main content Skip to Navigation
New interface
Conference papers

HUMB: Automatic Key Term Extraction from Scientific Articles in GROBID

Abstract : The Semeval task 5 was an opportunity for experimenting with the key term ex- traction module of GROBID, a system for extracting and generating bibliographical information from technical and scientific documents. The tool first uses GROBID's facilities for analyzing the structure of sci- entific articles, resulting in a first set of structural features. A second set of fea- tures captures content properties based on phraseness, informativeness and keyword- ness measures. Two knowledge bases, GRISP and Wikipedia, are then exploited for producing a last set of lexical/semantic features. Bagged decision trees appeared to be the most efficient machine learning algorithm for generating a list of ranked key term candidates. Finally a post rank- ing was realized based on statistics of co- usage of keywords in HAL, a large Open Access publication repository.
Document type :
Conference papers
Complete list of metadata
Contributor : Laurent Romary Connect in order to contact the contributor
Submitted on : Friday, June 18, 2010 - 5:17:19 PM
Last modification on : Tuesday, November 22, 2022 - 11:10:08 AM
Long-term archiving on: : Monday, October 22, 2012 - 12:00:39 PM



  • HAL Id : inria-00493437, version 1



Patrice Lopez, Laurent Romary. HUMB: Automatic Key Term Extraction from Scientific Articles in GROBID. SemEval 2010 Workshop, ACL SigLex event, Jul 2010, Uppsala, Sweden. 4 p. ⟨inria-00493437⟩



Record views


Files downloads