Skip to Main content Skip to Navigation
New interface
Conference papers

Extracting Disease-Symptom Relationships by Learning Syntactic Patterns from Dependency Graphs

Mohsen Hassan 1 Olfa Makkaoui 1 Adrien Coulet 1 Yannick Toussaint 1 
1 ORPAILLEUR - Knowledge representation, reasonning
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Disease-symptom relationships are of primary importance for biomedical informat-ics, but databases that catalog them are incomplete in comparison with the state of the art available in the scientific literature. We propose in this paper a novel method for automatically extracting disease-symptom relationships from text, called SPARE (standing for Syntactic PAttern for Relationship Extraction). This method is composed of 3 successive steps: first, we learn patterns from the dependency graphs; second, we select best patterns based on their respective quality and specificity (their ability to identify only disease-symptom relationships); finally, the patterns are used on new texts for extracting disease-symptom relationships. We experimented SPARE on a corpus of 121,796 abstracts of PubMed related to 457 rare diseases. The quality of the extraction has been evaluated depending on the pattern quality and specificity. The best F-measure obtained is 55.65% (for speci f icity ≥ 0.5 and quality ≥ 0.5). To provide an insight on the novelty of disease-symptom relationship extracted, we compare our results to the content of phenotype databases (OrphaData and OMIM). Our results show the feasibility of automatically extracting disease-symptom relationships, including true relationships that were not already referenced in phenotype databases and may involve complex symptom descriptions.
Complete list of metadata

Cited literature [27 references]  Display  Hide  Download
Contributor : Adrien Coulet Connect in order to contact the contributor
Submitted on : Monday, August 17, 2015 - 11:33:54 AM
Last modification on : Thursday, August 4, 2022 - 5:18:45 PM
Long-term archiving on: : Wednesday, November 18, 2015 - 10:41:42 AM


Files produced by the author(s)


  • HAL Id : hal-01184655, version 1


Mohsen Hassan, Olfa Makkaoui, Adrien Coulet, Yannick Toussaint. Extracting Disease-Symptom Relationships by Learning Syntactic Patterns from Dependency Graphs. BioNLP 15, Association for Computational Linguistics, Jul 2015, Beijing, China. pp.184. ⟨hal-01184655⟩



Record views


Files downloads