The Observable Web

Abstract : The web is now de facto the first place to publish data. However, retrieving the whole database represented by the web appears almost impossible. Some parts are known to be hard to discover automatically, giving rise to the so called hidden or invisible web. On the one hand, search engines try to index most of the web. Almost all related work is based on discovering the web by crawling. This paper is devoted to estimate how accurate is the view of the web obtained by crawling. Our approach is to compare crawling to other ways of discovering the web (mainly by analyzing server or proxy logs of web surfers activity). This work is a first step towards identifying the observable web.
Document type :
Reports
Complete list of metadatas

https://hal.inria.fr/inria-00071796
Contributor : Rapport de Recherche Inria <>
Submitted on : Tuesday, May 23, 2006 - 6:48:10 PM
Last modification on : Friday, May 25, 2018 - 12:02:03 PM
Long-term archiving on : Sunday, April 4, 2010 - 10:38:31 PM

Identifiers

  • HAL Id : inria-00071796, version 1

Collections

Citation

Yacine Boufkhad, Laurent Viennot. The Observable Web. [Research Report] RR-4790, INRIA. 2003. ⟨inria-00071796⟩

Share

Metrics

Record views

148

Files downloads

117