Skip to Main content Skip to Navigation
Book sections

Number of Frequent Patterns in Random Databases

Loïck Lhote 1
1 Equipe AMACC - Laboratoire GREYC - UMR6072
GREYC - Groupe de Recherche en Informatique, Image et Instrumentation de Caen
Abstract : In a tabular database, patterns which occur over a frequency threshold are called frequent patterns. They are central in numerous data processes and var-ious efficient algorithms were recently designed for mining them. Unfortunately, very few is known about the real difficulty of this mining, which is closely related to the number of frequent patterns. The worst case analysis always leads to an ex-ponential number of frequent patterns, but experimentations show that algorithms become efficient for reasonable frequency thresholds. We perform here a proba-bilistic analysis of the number of frequent patterns. We first introduce a general model of random databases that encompasses all the previous classical models. In this model, the rows of the database are seen as independent words generated by the same probabilistic source [i.e. a random process that emits symbols]. Under natural conditions on the source, the average number of frequent patterns is studied for various frequency thresholds. Then, we exhibit a large class of sources, the class of dynamical sources, which is proven to satisfy our general conditions. This finally shows that our results hold in a quite general context of random databases.
Complete list of metadata

Cited literature [8 references]  Display  Hide  Download
Contributor : Greyc Référent Connect in order to contact the contributor
Submitted on : Wednesday, November 12, 2014 - 2:30:34 PM
Last modification on : Tuesday, October 19, 2021 - 11:34:56 PM
Long-term archiving on: : Friday, February 13, 2015 - 11:05:33 AM


Files produced by the author(s)



Loïck Lhote. Number of Frequent Patterns in Random Databases. Advances in Data Analysis, pp.33 - 45, 2009, ⟨10.1007/978-0-8176-4799-5_4⟩. ⟨hal-01082026⟩



Les métriques sont temporairement indisponibles