Towards Bounding Sequential Patterns

Chedy Raïssi 1 Jian Pei 2
1 ORPAILLEUR - Knowledge representation, reasonning
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : Given a sequence database, can we have a non-trivial upper bound on the number of sequential patterns? The problem of bounding sequential patterns is very challenging in theory due to the combinatorial complexity of sequences, even given some inspiring results on bounding itemsets in frequent itemset mining. Moreover, the problem is highly meaningful in practice, since the upper bound can be used in many applications such as space allocation in building sequence data warehouses. In this paper, we tackle the problem of bounding sequential patterns by presenting, for the first time in the field of sequential pattern mining, strong combinatorial results on computing the number of possible sequential patterns that can be generated at a given length k. We introduce, as a case study, two novel techniques to estimate the number of candidate sequences. An extensive empirical study on both real data and synthetic data verifies the effectiveness of our methods.
Type de document :
Communication dans un congrès
Chid Apté and Joydeep Ghosh and Padhraic Smyth. 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD-2011, Aug 2011, San Diego, United States. ACM, 2011, 〈10.1145/2020408.2020612〉
Liste complète des métadonnées

Littérature citée [25 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00623550
Contributeur : Chedy Raïssi <>
Soumis le : mercredi 14 septembre 2011 - 15:47:30
Dernière modification le : jeudi 11 janvier 2018 - 06:19:54
Document(s) archivé(s) le : jeudi 30 mars 2017 - 14:58:32

Fichier

p1379.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Chedy Raïssi, Jian Pei. Towards Bounding Sequential Patterns. Chid Apté and Joydeep Ghosh and Padhraic Smyth. 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD-2011, Aug 2011, San Diego, United States. ACM, 2011, 〈10.1145/2020408.2020612〉. 〈inria-00623550〉

Partager

Métriques

Consultations de la notice

315

Téléchargements de fichiers

135