Random simulations of a datatable for efficiently mining reliable and non-redundant itemsets

Abstract : Our goal is twofold: 1) we want to mine the only statistically valid 2-itemsets out of a boolean datatable, 2) on this basis, we want to build the only higher-order non-redundant itemsets compared to their sub-itemsets. For the first task we have designed a randomization test (Tournebool) respectful of the structure of the data variables and independant from the specific distributions of the data. In our test set (959 texts and 8477 terms), this leads to a reduction from 126, 000 2-itemsets to 13, 000 significant ones, at the 99% confidence interval. For the second task, we have devised a hierarchical stepwise procedure (MIDOVA) for evaluating the residual amount of variation devoted to higher-order itemsets, yielding new possible positive or negative high-order relations. On our example, this leads to counts of 7,712 for 2-itemsets to 3 for 6-itemsets, and no higher-order ones, in a computationally efficient way.
Complete list of metadatas

Cited literature [14 references]  Display  Hide  Download

https://hal.inria.fr/inria-00186100
Contributor : Martine Cadot <>
Submitted on : Wednesday, November 7, 2007 - 10:02:38 PM
Last modification on : Friday, November 15, 2019 - 1:19:50 AM
Long-term archiving on: Monday, September 24, 2012 - 3:00:50 PM

File

Cadot_Lelu_ASMDA2007.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00186100, version 1

Citation

Martine Cadot, Pascal Cuxac, Alain Lelu. Random simulations of a datatable for efficiently mining reliable and non-redundant itemsets. 12th International Conference on Applied Stochastic Models and Data Analysis - ASMDA 2007, May 2007, Chania, Crête, Greece. ⟨inria-00186100⟩

Share

Metrics

Record views

374

Files downloads

241