Skip to Main content Skip to Navigation
Conference papers

Simuler et épurer pour extraire les motifs sûrs et non redondants

Martine Cadot 1, * Alain Lelu 2
* Corresponding author
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : Our goal is twofold: 1) we want to mine the only statistically valid 2-itemsets out of a boolean datatable, 2) on this basis, we want to build the only higher-order non-redundant itemsets compared to their sub-itemsets. For the first task we have designed a randomization test (Tournebool) respectful of the structure of the data variables and independant from the specific distributions of the data. In our test set (193 texts and 888 terms), this leads to a reduction from 400,000 2-itemsets to 4000 significant ones, at the 95% confidence interval. For the second task, we have devised a hierarchical stepwise procedure (MIDOVA) for evaluating the residual amount of variation devoted to higher-order itemsets, yielding new possible positive or negative high-order relations. On our example, this leads to 2300 3-itemsets, 41 4-itemsets, and no higher-order ones, in a computationally efficient way.
Complete list of metadatas

Cited literature [15 references]  Display  Hide  Download

https://hal.inria.fr/inria-00186096
Contributor : Martine Cadot <>
Submitted on : Wednesday, November 7, 2007 - 11:12:12 PM
Last modification on : Tuesday, October 27, 2020 - 2:34:29 PM
Long-term archiving on: : Monday, September 24, 2012 - 3:00:37 PM

File

Cadot_Lelu_QDC07cc.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00186096, version 1

Citation

Martine Cadot, Alain Lelu. Simuler et épurer pour extraire les motifs sûrs et non redondants. 7èmes Journées Francophones "Extraction et Gestion des Connaissance" - EGC 2007 - Troisième Atelier Qualité des Données et des Connaissances - QDC, Stéphane Lallich, Philippe Lenca et Fabrice Guillet, Jan 2007, Namur, Belgique. pp.15-24. ⟨inria-00186096⟩

Share

Metrics

Record views

444

Files downloads

240