A Data Cleaning Solution by Perl Scripts for the KDD Cup 2003 Task 2

Martine Cadot 1 Joseph Di Martino 2
1 ORPAILLEUR - Knowledge representation, reasonning
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
2 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In this paper, we present our solution for the KDD CUP 2003 task 2 competition. Our approach is based on a data cleaning methodology using Perl scripts. These scripts contain regular expressions for automatically extracting relevant information from the 35472 LaTeX texts. These expressions were optimized by statistical investigations on the texts. Our solution has permitted us to obtain 144,087 associations.
Type de document :
Article dans une revue
SIGKDD Explorations, ACM, 2004, 5 (2), pp.158-159
Liste complète des métadonnées

https://hal.inria.fr/inria-00100173
Contributeur : Joseph Di Martino <>
Soumis le : mardi 26 septembre 2006 - 10:15:09
Dernière modification le : jeudi 11 janvier 2018 - 06:19:55

Identifiants

  • HAL Id : inria-00100173, version 1

Collections

Citation

Martine Cadot, Joseph Di Martino. A Data Cleaning Solution by Perl Scripts for the KDD Cup 2003 Task 2. SIGKDD Explorations, ACM, 2004, 5 (2), pp.158-159. 〈inria-00100173〉

Partager

Métriques

Consultations de la notice

192