GROBID - Information Extraction from Scientific Publications

Abstract : Scientific papers potentially offer a wealth of information that allows one to put the corresponding work in context and offer a wide range of services to researchers. GROBID is a high performing software environment to extract such information as metadata, bibliographic references or entities in scientific texts. Most modern digital library techniques rely on the availability of high quality textual documents. In practice, however, the majority of full text collections are in raw PDF or in incomplete and inconsistent semi-structured XML. To address this fundamental issue, the development of the Java library GROBID started in 2008 [1]. The tool exploits “Conditional Random Fields” (CRF), a machine-learning technique for extracting and restructuring content automatically from raw and heterogeneous sources into uniform standard TEI (Text Encoding Initiative) documents.
Liste complète des métadonnées

Littérature citée [2 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01673305
Contributeur : Laurent Romary <>
Soumis le : vendredi 29 décembre 2017 - 10:00:22
Dernière modification le : jeudi 15 novembre 2018 - 20:27:26

Fichiers

03-romary-final.pdf
Fichiers produits par l'(les) auteur(s)

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

  • HAL Id : hal-01673305, version 1

Citation

Laurent Romary, Patrice Lopez. GROBID - Information Extraction from Scientific Publications. ERCIM News, ERCIM, 2015, Scientific Data Sharing and Re-use, 100, 〈https://ercim-news.ercim.eu/en100/r-i/grobid-information-extraction-from-scientific-publications〉. 〈hal-01673305〉

Partager

Métriques

Consultations de la notice

364

Téléchargements de fichiers

278