A Data Cleaning Solution by Perl Scripts for the KDD Cup 2003 Task 2 - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Article Dans Une Revue SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining Année : 2004

A Data Cleaning Solution by Perl Scripts for the KDD Cup 2003 Task 2

Martine Cadot

Résumé

In this paper, we present our solution for the KDD CUP 2003 task 2 competition. Our approach is based on a data cleaning methodology using Perl scripts. These scripts contain regular expressions for automatically extracting relevant information from the 35472 LaTeX texts. These expressions were optimized by statistical investigations on the texts. Our solution has permitted us to obtain 144,087 associations.
Fichier non déposé

Dates et versions

inria-00100173 , version 1 (26-09-2006)

Identifiants

  • HAL Id : inria-00100173 , version 1

Citer

Martine Cadot, Joseph Di Martino. A Data Cleaning Solution by Perl Scripts for the KDD Cup 2003 Task 2. SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining, 2004, 5 (2), pp.158-159. ⟨inria-00100173⟩
114 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More