Clinical Text Mining for Context Sequences Identification

Svetla Boytcheva

doi:10.1007/978-3-319-99740-7_15

Communication Dans Un Congrès Année : 2018

Clinical Text Mining for Context Sequences Identification

(1)

Svetla Boytcheva

Fonction : Auteur
PersonId : 1043687

Institute of Information and Communication Technologies

Résumé

This paper presents an approach based on sequence mining for identification of context models of diseases described by different medical specialists in clinical text. Clinical narratives contain rich medical terminology, specific abbreviations, and various numerical values. Usually raw clinical texts contain too many typos. Due to the telegraphic style of the text and incomplete sentences, the general part of speech taggers and syntax parsers are not efficient in text processing of non-English clinical text. The proposed approach is language independent. Thus, the method is suitable for processing clinical texts in low resource languages. The experiments are done on pseudonimized outpatient records in Bulgarian language produced by four different specialists for the same cohort of patients suffering from similar disorders. The results show that from the clinical documents can be identified the specialty of the physician. Even the close vocabulary is used in the patient status description there are slight differences in the language used by different physicians. The depth and the details of the description allow to determine different aspects and to identify the focus in the text. The proposed data driven approach will help for automatic clinical text classification depending on the specialty of the physician who wrote the document. The experimental results show high precision and recall in classification task for all classes of specialist represented in the dataset. The comparison of the proposed method with bag of words method show some improvement of the results in document classification task.

Mots clés

Data mining Text mining Health informatics

Domaines

Informatique [cs] Sciences de l'information et de la communication

Fichier principal

472936_1_En_15_Chapter.pdf (626.51 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Hal Ifip : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-02060045

Soumis le : jeudi 7 mars 2019-10:36:43

Dernière modification le : vendredi 6 septembre 2019-11:14:03

Archivage à long terme le : samedi 8 juin 2019-13:30:52

Dates et versions

hal-02060045 , version 1 (07-03-2019)

Licence

Paternité

Identifiants

HAL Id : hal-02060045 , version 1
DOI : 10.1007/978-3-319-99740-7_15

Citer

Svetla Boytcheva. Clinical Text Mining for Context Sequences Identification. 2nd International Cross-Domain Conference for Machine Learning and Knowledge Extraction (CD-MAKE), Aug 2018, Hamburg, Germany. pp.223-236, ⟨10.1007/978-3-319-99740-7_15⟩. ⟨hal-02060045⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IFIP-LNCS IFIP IFIP-TC IFIP-TC5 IFIP-WG IFIP-TC12 IFIP-TC8 IFIP-WG8-4 IFIP-WG8-9 IFIP-CD-MAKE IFIP-WG12-9 IFIP-LNCS-11015

42 Consultations

76 Téléchargements

Clinical Text Mining for Context Sequences Identification

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager