Scale-Adaptable Recrawl Strategies for DHT-based Distributed Web Crawling System

Abstract : Large scale distributed Web crawling system using voluntarily contributed personal computing resources allows small companies to build their own search engines with very low cost. The biggest challenge for such system is how to implement the functionalities equivalent to that of the traditional search engines under a fluctuating distributed environment. One of the functionalities is incremental crawl which requires recrawl each Web site according to the update frequency of each Web site's content. However, recrawl intervals solely calculated from change frequency of the Web sites may mismatch the system's real-time capacity which leads to inefficient utilization of resources. Based on our previous works on a DHT-based Web crawling system, in this paper, we propose two scale-adaptable recrawl strategies aiming to find solutions to the above issue. The methods proposed are evaluated through simulations based on real Web datasets and show satisfactory results.
Type de document :
Communication dans un congrès
Chen Ding; Zhiyuan Shao; Ran Zheng. IFIP International Conference on Network and Parallel Computing (NPC), Sep 2010, Zhengzhou, China. Springer, Lecture Notes in Computer Science, LNCS-6289, pp.91-105, 2010, Network and Parallel Computing. 〈10.1007/978-3-642-15672-4_9〉
Liste complète des métadonnées

Littérature citée [16 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01054954
Contributeur : Hal Ifip <>
Soumis le : lundi 11 août 2014 - 10:01:30
Dernière modification le : vendredi 11 août 2017 - 17:44:18
Document(s) archivé(s) le : jeudi 27 novembre 2014 - 10:52:04

Fichier

manuscript-source-10-06-11-NPC...
Fichiers produits par l'(les) auteur(s)

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

Citation

Xiao Xu, Weizhe Zhang, Hongli Zhang, Binxing Fang. Scale-Adaptable Recrawl Strategies for DHT-based Distributed Web Crawling System. Chen Ding; Zhiyuan Shao; Ran Zheng. IFIP International Conference on Network and Parallel Computing (NPC), Sep 2010, Zhengzhou, China. Springer, Lecture Notes in Computer Science, LNCS-6289, pp.91-105, 2010, Network and Parallel Computing. 〈10.1007/978-3-642-15672-4_9〉. 〈hal-01054954〉

Partager

Métriques

Consultations de la notice

122

Téléchargements de fichiers

147