Mining Heterogeneous Multidimensional Sequential Data: An Application to the Analysis of Patient Healthcare Trajectories

Elias Egho 1
1 ORPAILLEUR - Knowledge representation, reasonning
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : All domains of science and technology produce large and heterogeneous data. Although a lot of work was done in this area, mining such data is still a challenge. No previous research work targets the mining of heterogeneous multidimensional sequential data. This thesis proposes a contribution to knowledge discovery in heterogeneous sequential data. We study three different research directions: (i) Extractionof sequential patterns, (ii) Classification and (iii) Clustering of sequential data. Firstly we generalize the notion of a multidimensional sequence by considering complex and heterogeneous sequential structure. We present a new approach called MMISP to extract sequential patterns from heterogeneous sequential data. MMISP generates a large number of sequential patterns as this is usually the case for pattern enumeration algorithms. To overcome this problem, we propose a novel way of considering heterogeneous multidimensional sequences by mapping them into pattern structures. We develop a framework for enumerating only patterns satisfying given constraints. The second research direction is in concern with the classification of heterogeneous multidimensional sequences. We use Formal Concept Analysis (FCA) as a classification method. We show interesting properties of concept lattices and of stability index to classify sequences into a concept lattice and to select some interesting groups of sequences. The third research direction in this thesis is in concern with the clustering of heterogeneous multidimensional sequential data. We focus on the notion of common subsequences to define similarity between a pair of sequences composed of a list of itemsets. We use this similarity measure to build a similarity matrix between sequences and to separate them in different groups. In this work, we present theoretical results and an efficient dynamic programming algorithm to count the number of common subsequences between two sequences without enumerating all subsequences. The system resulting from this research work was applied to analyze and mine patient healthcare trajectories in oncology. Data are taken from a medicoadministrative database including all information about the hospitalizations of patients in Lorraine Region (France). The system allows to identify and characterize episodes of care for specific sets of patients. Results were discussed and validated with domain experts.
Document type :
Theses
Complete list of metadatas

Cited literature [175 references]  Display  Hide  Download

https://hal.inria.fr/tel-01094400
Contributor : Elias Egho <>
Submitted on : Friday, December 12, 2014 - 11:36:28 AM
Last modification on : Tuesday, December 18, 2018 - 4:38:02 PM
Long-term archiving on : Friday, March 13, 2015 - 10:45:51 AM

Identifiers

  • HAL Id : tel-01094400, version 1

Citation

Elias Egho. Mining Heterogeneous Multidimensional Sequential Data: An Application to the Analysis of Patient Healthcare Trajectories. Other [cs.OH]. Université de Lorraine, 2014. English. ⟨tel-01094400⟩

Share

Metrics

Record views

529

Files downloads

543