The Colorado Richly Annotated Full Text (CRAFT) Corpus: Multi-Model Annotation In The Biomedical Domain

Abstract : A major question in linguistics is whether theoretical accounts of the general language work for specific domains. Similarly, in natural language processing, it is clear that general-domain solutions often fail when applied to specialized domains. One such specialized domain, which is increasingly seen as crucial to understanding human biology and disease, is the biomedical domain. For this reason, biomedical corpus construction has been an area of considerable activity in recent years—for example, just in the past five years: (ordered by year of publication and then by first author's last name). Historically, the great majority of work in biomedical natural language processing has been done using abstracts of journal articles. In contrast, the Colorado Richly Annotated Full Text (CRAFT) corpus consists entirely of full-text journal articles. The primary motivation for the annotation project was the accumulating body of evidence indicating that the bodies of journal articles contain much information that is not present in the abstracts, and that the textual and structural characteristics of article bodies are different from those of abstracts [8, 26, 90, 84, 18, 2, 48, 51, 13]. When we began the project, there was no large resource of full-text journal articles for system building or evaluation; other than the CRAFT corpus, this continues to be the case. Earlier projects on full-text biomedical journal articles are typically not manually annotated, and none of them that we are aware of have annotation of linguistic structure.
Type de document :
Chapitre d'ouvrage
Handbook of Linguistic Annotation, 2016
Liste complète des métadonnées

Littérature citée [75 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01159065
Contributeur : Karën Fort <>
Soumis le : mardi 2 juin 2015 - 15:11:39
Dernière modification le : dimanche 1 octobre 2017 - 01:06:52
Document(s) archivé(s) le : mardi 25 avril 2017 - 01:07:13

Fichier

CRAFT_Final.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01159065, version 1

Citation

Kevin Bretonnel Cohen, Karin Verspoor, Karën Fort, Christopher Funk, Michael Bada, et al.. The Colorado Richly Annotated Full Text (CRAFT) Corpus: Multi-Model Annotation In The Biomedical Domain. Handbook of Linguistic Annotation, 2016. 〈hal-01159065〉

Partager

Métriques

Consultations de la notice

429

Téléchargements de fichiers

196