Learning Subgraph Patterns from text for Extracting Disease–Symptom Relationships

Mohsen Hassan 1 Adrien Coulet 1 Yannick Toussaint 1
1 ORPAILLEUR - Knowledge representation, reasonning
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : To some extent, texts can be represented in the form of graphs, such as dependency graphs in which nodes represent words and edges represent grammatical dependencies between words. Graph representation of texts is an interesting alternative to string representation because it provides an additional level of abstraction over the syntax that is sometime easier to compute. In this paper, we study the use of graph mining methods on texts represented as dependency graphs, for extracting relationships between pairs of annotated entities. We propose a three step approach that includes (1) the transformation of texts in a collection of dependency graphs; (2) the selection of frequent subgraphs, named hereafter patterns, on the basis of positive sentences; and (3) the extraction of relationships by searching for occurrences of patterns in novel sentences. Our method has been experimented by extracting disease–symptom relationships from a corpus of 51,292 PubMed abstracts (428,491 sentences)related to 50 rare diseases. The extraction of correct disease–symptom relationships has been evaluated on 565 sentences, showing a precision of 0.91 and a recall of 0.49 (F-Meaure is 0.63). These preliminary experiments show the feasibility of extracting good quality relationships using frequent subgraph mining.
Type de document :
Communication dans un congrès
Peggy Cellier; Thierry Charnois; Andreas Hotho; Stan Matwin; Marie-Francine Moens; Yannick Toussaint. 1st International Workshop on Interactions between Data Mining and Natural Language Processing, Sep 2014, Nancy, France. ceur-ws, 1202, 2014, 〈http://ceur-ws.org/Vol-1202/〉
Liste complète des métadonnées

Littérature citée [24 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01095595
Contributeur : Adrien Coulet <>
Soumis le : lundi 15 décembre 2014 - 20:09:26
Dernière modification le : jeudi 11 janvier 2018 - 06:25:24
Document(s) archivé(s) le : lundi 16 mars 2015 - 12:46:19

Fichier

paper6.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01095595, version 1

Collections

Citation

Mohsen Hassan, Adrien Coulet, Yannick Toussaint. Learning Subgraph Patterns from text for Extracting Disease–Symptom Relationships. Peggy Cellier; Thierry Charnois; Andreas Hotho; Stan Matwin; Marie-Francine Moens; Yannick Toussaint. 1st International Workshop on Interactions between Data Mining and Natural Language Processing, Sep 2014, Nancy, France. ceur-ws, 1202, 2014, 〈http://ceur-ws.org/Vol-1202/〉. 〈hal-01095595〉

Partager

Métriques

Consultations de la notice

293

Téléchargements de fichiers

396