Extracting Disease-Symptom Relationships by Learning Syntactic Patterns from Dependency Graphs

Mohsen Hassan 1 Olfa Makkaoui 1 Adrien Coulet 1 Yannick Toussaint 1
1 ORPAILLEUR - Knowledge representation, reasonning
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Disease-symptom relationships are of primary importance for biomedical informat-ics, but databases that catalog them are incomplete in comparison with the state of the art available in the scientific literature. We propose in this paper a novel method for automatically extracting disease-symptom relationships from text, called SPARE (standing for Syntactic PAttern for Relationship Extraction). This method is composed of 3 successive steps: first, we learn patterns from the dependency graphs; second, we select best patterns based on their respective quality and specificity (their ability to identify only disease-symptom relationships); finally, the patterns are used on new texts for extracting disease-symptom relationships. We experimented SPARE on a corpus of 121,796 abstracts of PubMed related to 457 rare diseases. The quality of the extraction has been evaluated depending on the pattern quality and specificity. The best F-measure obtained is 55.65% (for speci f icity ≥ 0.5 and quality ≥ 0.5). To provide an insight on the novelty of disease-symptom relationship extracted, we compare our results to the content of phenotype databases (OrphaData and OMIM). Our results show the feasibility of automatically extracting disease-symptom relationships, including true relationships that were not already referenced in phenotype databases and may involve complex symptom descriptions.
Type de document :
Communication dans un congrès
BioNLP 15, Jul 2015, Beijing, China. pp.184, 2015, Proceedings of BioNLP 15. 〈http://www.aclweb.org/aclwiki/index.php?title=BioNLP_Workshop〉
Liste complète des métadonnées

Littérature citée [27 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01184655
Contributeur : Adrien Coulet <>
Soumis le : lundi 17 août 2015 - 11:33:54
Dernière modification le : jeudi 11 janvier 2018 - 06:25:24
Document(s) archivé(s) le : mercredi 18 novembre 2015 - 10:41:42

Fichier

hassan_et_al_bionlp2015.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01184655, version 1

Collections

Citation

Mohsen Hassan, Olfa Makkaoui, Adrien Coulet, Yannick Toussaint. Extracting Disease-Symptom Relationships by Learning Syntactic Patterns from Dependency Graphs. BioNLP 15, Jul 2015, Beijing, China. pp.184, 2015, Proceedings of BioNLP 15. 〈http://www.aclweb.org/aclwiki/index.php?title=BioNLP_Workshop〉. 〈hal-01184655〉

Partager

Métriques

Consultations de la notice

398

Téléchargements de fichiers

780