Skip to Main content Skip to Navigation
Conference papers

Graphes des liens et anti-liens statistiquement valides entre les mots d'un corpus textuel

Alain Lelu 1, 2 Martine Cadot 3
2 KIWI - Knowledge Information and Web Intelligence
LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
3 ABC - Machine Learning and Computational Biology
LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : Neighborhood is a central concept in datamining, and a bunch of definitions have been implemented, mainly rooted in geometrical or topological considerations. We propose here a statistical definition of neighborhood: our TourneBool randomization test processes an ob-jects vs. attributes binary table in order to establish which inter-attribute relation is fortuitous, and which one is meaningful, out of any hypotheses on the underlying statistical distribu-tions, but taking into account these empirical distributions. It ensues a robust and statistically validated graph. A previous encouraging small-scale test led us to scale up the different phases of the process, making it possible to test it on one of the public access Reuters test corpus. We then characterized the resulting word graph with a series of well-known indicators, such as clustering coefficients, degree distribution and correlation, cluster modularity and size distribution. Another graph structure stems from this process: the one conveying the negative " counter-relations " between words, i.e. words which " steer clear " one from another. We characterize in the same way the counter-relations graph.
Complete list of metadatas

https://hal.inria.fr/inria-00342751
Contributor : Alain Lelu <>
Submitted on : Friday, November 28, 2008 - 1:48:30 PM
Last modification on : Tuesday, October 27, 2020 - 2:34:28 PM

Identifiers

  • HAL Id : inria-00342751, version 1

Citation

Alain Lelu, Martine Cadot. Graphes des liens et anti-liens statistiquement valides entre les mots d'un corpus textuel. Extraction et gestion de connaissance 2009 (EGC'09), Pierre Gançarski, Jan 2009, Strasbourg, France. pp.367-378. ⟨inria-00342751⟩

Share

Metrics

Record views

617