Number of Frequent Patterns in Random Databases

Loïck Lhote 1
1 Equipe AMACC - Laboratoire GREYC - UMR6072
GREYC - Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen
Abstract : In a tabular database, patterns which occur over a frequency threshold are called frequent patterns. They are central in numerous data processes and var-ious efficient algorithms were recently designed for mining them. Unfortunately, very few is known about the real difficulty of this mining, which is closely related to the number of frequent patterns. The worst case analysis always leads to an ex-ponential number of frequent patterns, but experimentations show that algorithms become efficient for reasonable frequency thresholds. We perform here a proba-bilistic analysis of the number of frequent patterns. We first introduce a general model of random databases that encompasses all the previous classical models. In this model, the rows of the database are seen as independent words generated by the same probabilistic source [i.e. a random process that emits symbols]. Under natural conditions on the source, the average number of frequent patterns is studied for various frequency thresholds. Then, we exhibit a large class of sources, the class of dynamical sources, which is proven to satisfy our general conditions. This finally shows that our results hold in a quite general context of random databases.
Liste complète des métadonnées


https://hal.archives-ouvertes.fr/hal-01082026
Contributeur : Greyc Référent <>
Soumis le : mercredi 12 novembre 2014 - 14:30:34
Dernière modification le : jeudi 28 mai 2015 - 01:10:01
Document(s) archivé(s) le : vendredi 13 février 2015 - 11:05:33

Fichier

CHS-LHOTE-2010-2.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Loïck Lhote. Number of Frequent Patterns in Random Databases. Advances in Data Analysis, pp.33 - 45, 2009, <10.1007/978-0-8176-4799-5_4>. <hal-01082026>

Partager

Métriques

Consultations de
la notice

160

Téléchargements du document

50