Skip to Main content Skip to Navigation
Journal articles

GROBID - Information Extraction from Scientific Publications

Abstract : Scientific papers potentially offer a wealth of information that allows one to put the corresponding work in context and offer a wide range of services to researchers. GROBID is a high performing software environment to extract such information as metadata, bibliographic references or entities in scientific texts. Most modern digital library techniques rely on the availability of high quality textual documents. In practice, however, the majority of full text collections are in raw PDF or in incomplete and inconsistent semi-structured XML. To address this fundamental issue, the development of the Java library GROBID started in 2008 [1]. The tool exploits “Conditional Random Fields” (CRF), a machine-learning technique for extracting and restructuring content automatically from raw and heterogeneous sources into uniform standard TEI (Text Encoding Initiative) documents.
Complete list of metadata

Cited literature [2 references]  Display  Hide  Download
Contributor : Laurent Romary Connect in order to contact the contributor
Submitted on : Friday, December 29, 2017 - 10:00:22 AM
Last modification on : Tuesday, January 11, 2022 - 11:16:24 AM


Files produced by the author(s)


Distributed under a Creative Commons Attribution 4.0 International License


  • HAL Id : hal-01673305, version 1


Laurent Romary, Patrice Lopez. GROBID - Information Extraction from Scientific Publications. ERCIM News, ERCIM, 2015, Scientific Data Sharing and Re-use, 100. ⟨hal-01673305⟩



Les métriques sont temporairement indisponibles