Number of Frequent Patterns in Random Databases

Loïck Lhote 1
1 Equipe AMACC - Laboratoire GREYC - UMR6072
GREYC - Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen
Abstract : In a tabular database, patterns which occur over a frequency threshold are called frequent patterns. They are central in numerous data processes and var-ious efficient algorithms were recently designed for mining them. Unfortunately, very few is known about the real difficulty of this mining, which is closely related to the number of frequent patterns. The worst case analysis always leads to an ex-ponential number of frequent patterns, but experimentations show that algorithms become efficient for reasonable frequency thresholds. We perform here a proba-bilistic analysis of the number of frequent patterns. We first introduce a general model of random databases that encompasses all the previous classical models. In this model, the rows of the database are seen as independent words generated by the same probabilistic source [i.e. a random process that emits symbols]. Under natural conditions on the source, the average number of frequent patterns is studied for various frequency thresholds. Then, we exhibit a large class of sources, the class of dynamical sources, which is proven to satisfy our general conditions. This finally shows that our results hold in a quite general context of random databases.
Liste complète des métadonnées

Littérature citée [8 références]  Voir  Masquer  Télécharger
Contributeur : Greyc Référent <>
Soumis le : mercredi 12 novembre 2014 - 14:30:34
Dernière modification le : mardi 24 avril 2018 - 01:44:07
Document(s) archivé(s) le : vendredi 13 février 2015 - 11:05:33


Fichiers produits par l'(les) auteur(s)



Loïck Lhote. Number of Frequent Patterns in Random Databases. Advances in Data Analysis, pp.33 - 45, 2009, 〈10.1007/978-0-8176-4799-5_4〉. 〈hal-01082026〉



Consultations de la notice


Téléchargements de fichiers