GROBID - Information Extraction from Scientific Publications

Abstract : Scientific papers potentially offer a wealth of information that allows one to put the corresponding work in context and offer a wide range of services to researchers. GROBID is a high performing software environment to extract such information as metadata, bibliographic references or entities in scientific texts. Most modern digital library techniques rely on the availability of high quality textual documents. In practice, however, the majority of full text collections are in raw PDF or in incomplete and inconsistent semi-structured XML. To address this fundamental issue, the development of the Java library GROBID started in 2008 [1]. The tool exploits “Conditional Random Fields” (CRF), a machine-learning technique for extracting and restructuring content automatically from raw and heterogeneous sources into uniform standard TEI (Text Encoding Initiative) documents.
Liste complète des métadonnées

Littérature citée [2 références]  Voir  Masquer  Télécharger
Contributeur : Laurent Romary <>
Soumis le : vendredi 29 décembre 2017 - 10:00:22
Dernière modification le : vendredi 4 janvier 2019 - 17:33:24


Fichiers produits par l'(les) auteur(s)


Distributed under a Creative Commons Paternité 4.0 International License


  • HAL Id : hal-01673305, version 1


Laurent Romary, Patrice Lopez. GROBID - Information Extraction from Scientific Publications. ERCIM News, ERCIM, 2015, Scientific Data Sharing and Re-use, 100, 〈〉. 〈hal-01673305〉



Consultations de la notice


Téléchargements de fichiers