Extracting Disease-Symptom Relationships by Learning Syntactic Patterns from Dependency Graphs

Mohsen Hassan 1 Olfa Makkaoui 1 Adrien Coulet 1 Yannick Toussaint 1
1 ORPAILLEUR - Knowledge representation, reasonning
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Disease-symptom relationships are of primary importance for biomedical informat-ics, but databases that catalog them are incomplete in comparison with the state of the art available in the scientific literature. We propose in this paper a novel method for automatically extracting disease-symptom relationships from text, called SPARE (standing for Syntactic PAttern for Relationship Extraction). This method is composed of 3 successive steps: first, we learn patterns from the dependency graphs; second, we select best patterns based on their respective quality and specificity (their ability to identify only disease-symptom relationships); finally, the patterns are used on new texts for extracting disease-symptom relationships. We experimented SPARE on a corpus of 121,796 abstracts of PubMed related to 457 rare diseases. The quality of the extraction has been evaluated depending on the pattern quality and specificity. The best F-measure obtained is 55.65% (for speci f icity ≥ 0.5 and quality ≥ 0.5). To provide an insight on the novelty of disease-symptom relationship extracted, we compare our results to the content of phenotype databases (OrphaData and OMIM). Our results show the feasibility of automatically extracting disease-symptom relationships, including true relationships that were not already referenced in phenotype databases and may involve complex symptom descriptions.
Complete list of metadatas

Cited literature [27 references]  Display  Hide  Download

https://hal.inria.fr/hal-01184655
Contributor : Adrien Coulet <>
Submitted on : Monday, August 17, 2015 - 11:33:54 AM
Last modification on : Tuesday, December 18, 2018 - 4:38:02 PM
Long-term archiving on : Wednesday, November 18, 2015 - 10:41:42 AM

File

hassan_et_al_bionlp2015.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01184655, version 1

Collections

Citation

Mohsen Hassan, Olfa Makkaoui, Adrien Coulet, Yannick Toussaint. Extracting Disease-Symptom Relationships by Learning Syntactic Patterns from Dependency Graphs. BioNLP 15, Association for Computational Linguistics, Jul 2015, Beijing, China. pp.184. ⟨hal-01184655⟩

Share

Metrics

Record views

474

Files downloads

1189