Advanced Preprocessing for intersites Web Usage Mining

Doru Tanasa 1 Brigitte Trousee 1
1 AxIS - Usage-centered design, analysis and improvement of information systems
CRISAM - Inria Sophia Antipolis - Méditerranée , Inria Paris-Rocquencourt
Abstract : Web usage mining applies data mining procedures to analyze user access of Web sites. As with any KDD (knowledge discovery and data mining) process, WUM contains three main steps: preprocessing, knowledge extraction, and results analysis. We focus on data preprocessing, a fastidious, complex process. Analysts aim to determine the exact list of users who accessed the Web site and to reconstitute user sessions-the sequence of actions each user performed on the Web site. Intersites WUM deals with Web server logs from several Web sites, generally belonging to the same organization. Thus, analysts must reassemble the users' path through all the different Web servers that they visited. Our solution is to join all the log files and reconstitute the visit. Classical data preprocessing involves three steps: data fusion, data cleaning, and data structuration. Our solution for WUM adds what we call advanced data preprocessing. This consists of a data summarization step, which will allow the analyst to select only the information of interest. We've successfully tested our solution in an experiment with log files from INRIA Web sites. Published in:
Type de document :
Article dans une revue
IEEE Intelligent Systems, Institute of Electrical and Electronics Engineers, 2004, 19 (2), pp.59-65. 〈http://www.computer.org/csdl/mags/ex/2004/02/x2059-abs.html〉. 〈10.1109/MIS.2004.1274912〉
Liste complète des métadonnées

https://hal.inria.fr/hal-00950763
Contributeur : Brigitte Trousse <>
Soumis le : samedi 22 février 2014 - 15:46:31
Dernière modification le : jeudi 11 janvier 2018 - 17:07:45

Identifiants

Collections

Citation

Doru Tanasa, Brigitte Trousee. Advanced Preprocessing for intersites Web Usage Mining. IEEE Intelligent Systems, Institute of Electrical and Electronics Engineers, 2004, 19 (2), pp.59-65. 〈http://www.computer.org/csdl/mags/ex/2004/02/x2059-abs.html〉. 〈10.1109/MIS.2004.1274912〉. 〈hal-00950763〉

Partager

Métriques

Consultations de la notice

82