The Observable Web - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Rapport (Rapport De Recherche) Année : 2003

The Observable Web

Yacine Boufkhad
Laurent Viennot

Résumé

The web is now de facto the first place to publish data. However, retrieving the whole database represented by the web appears almost impossible. Some parts are known to be hard to discover automatically, giving rise to the so called hidden or invisible web. On the one hand, search engines try to index most of the web. Almost all related work is based on discovering the web by crawling. This paper is devoted to estimate how accurate is the view of the web obtained by crawling. Our approach is to compare crawling to other ways of discovering the web (mainly by analyzing server or proxy logs of web surfers activity). This work is a first step towards identifying the observable web.

Mots clés

Domaines

Autre [cs.OH]
Fichier principal
Vignette du fichier
RR-4790.pdf (140.51 Ko) Télécharger le fichier

Dates et versions

inria-00071796 , version 1 (23-05-2006)

Identifiants

  • HAL Id : inria-00071796 , version 1

Citer

Yacine Boufkhad, Laurent Viennot. The Observable Web. [Research Report] RR-4790, INRIA. 2003. ⟨inria-00071796⟩
89 Consultations
77 Téléchargements

Partager

Gmail Facebook X LinkedIn More