C3Ro: An efficient mining algorithm of extended-closed contiguous robust sequential patterns in noisy data - Archive ouverte HAL Access content directly
Journal Articles Expert Systems with Applications Year : 2019

C3Ro: An efficient mining algorithm of extended-closed contiguous robust sequential patterns in noisy data

(1) , (1) , (1)
1

Abstract

Sequential pattern mining has been the focus of many works, but still faces a tough challenge in the mining of large databases for both efficiency and apprehensibility of its resulting set. To overcome these issues, the most promising direction taken by the literature relies on the use of constraints, including the well-known closedness constraint. However, such a mining is not resistant to noise in data, a characteristic of most real-world data. The main research question raised in this paper is thus: how to efficiently mine an apprehensible set of sequential patterns from noisy data? In order to address this research question, we introduce 1) two original constraints designed for the mining of noisy data: the robustness and the extended-closedness constraints, 2) a generic pattern mining algorithm, C3Ro, designed to mine a wide range of sequential patterns, going from closed or maximal contiguous sequential patterns to closed or maximal regular sequential patterns. C3Ro is dedicated to practitioners and is able to manage their multiple constraints. C3Ro also is the first sequential pattern mining algorithm to be as generic and parameterizable. Extensive experiments have been conducted and reveal the high efficiency of C3Ro, especially in large datasets, over well-known algorithms from the literature. Additional experiments have been conducted on a real-world job offers noisy dataset, with the goal to mine activities. This experiment offers a more thorough insight into C3Ro algorithm: job market experts confirm that the constraints we introduced actually have a significant positive impact on the apprehensibility of the set of mined activities.
Fichier principal
Vignette du fichier
Article_VersionAuteur.pdf (240.16 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-02977461 , version 1 (25-10-2020)

Identifiers

Cite

Y Abboud, Armelle Brun, Anne Boyer. C3Ro: An efficient mining algorithm of extended-closed contiguous robust sequential patterns in noisy data. Expert Systems with Applications, 2019, 131, pp.172 - 189. ⟨10.1016/j.eswa.2019.04.058⟩. ⟨hal-02977461⟩
96 View
68 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More