Skip to Main content Skip to Navigation

The Observable Web

Abstract : The web is now de facto the first place to publish data. However, retrieving the whole database represented by the web appears almost impossible. Some parts are known to be hard to discover automatically, giving rise to the so called hidden or invisible web. On the one hand, search engines try to index most of the web. Almost all related work is based on discovering the web by crawling. This paper is devoted to estimate how accurate is the view of the web obtained by crawling. Our approach is to compare crawling to other ways of discovering the web (mainly by analyzing server or proxy logs of web surfers activity). This work is a first step towards identifying the observable web.
Document type :
Complete list of metadata
Contributor : Rapport De Recherche Inria Connect in order to contact the contributor
Submitted on : Tuesday, May 23, 2006 - 6:48:10 PM
Last modification on : Wednesday, April 6, 2022 - 3:48:27 PM
Long-term archiving on: : Sunday, April 4, 2010 - 10:38:31 PM


  • HAL Id : inria-00071796, version 1



yacine Boufkhad, Laurent Viennot. The Observable Web. [Research Report] RR-4790, INRIA. 2003. ⟨inria-00071796⟩



Record views


Files downloads