GROBID - Information Extraction from Scientific Publications - Archive ouverte HAL Access content directly
Journal Articles ERCIM News Year : 2015

GROBID - Information Extraction from Scientific Publications

(1) , (2)
1
2

Abstract

Scientific papers potentially offer a wealth of information that allows one to put the corresponding work in context and offer a wide range of services to researchers. GROBID is a high performing software environment to extract such information as metadata, bibliographic references or entities in scientific texts. Most modern digital library techniques rely on the availability of high quality textual documents. In practice, however, the majority of full text collections are in raw PDF or in incomplete and inconsistent semi-structured XML. To address this fundamental issue, the development of the Java library GROBID started in 2008 [1]. The tool exploits “Conditional Random Fields” (CRF), a machine-learning technique for extracting and restructuring content automatically from raw and heterogeneous sources into uniform standard TEI (Text Encoding Initiative) documents.
Fichier principal
Vignette du fichier
03-romary-final.pdf (247.91 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-01673305 , version 1 (29-12-2017)

Licence

Attribution - CC BY 4.0

Identifiers

  • HAL Id : hal-01673305 , version 1

Cite

Laurent Romary, Patrice Lopez. GROBID - Information Extraction from Scientific Publications. ERCIM News, 2015, Scientific Data Sharing and Re-use, 100. ⟨hal-01673305⟩
1328 View
1092 Download

Share

Gmail Facebook Twitter LinkedIn More