PageRank optimization applied to spam detection

Olivier Fercoq 1, 2
1 MAXPLUS - Max-plus algebras and mathematics of decision
CMAP - Centre de Mathématiques Appliquées - Ecole Polytechnique, Inria Saclay - Ile de France, X - École polytechnique, CNRS - Centre National de la Recherche Scientifique : UMR
2 Operational Research and Optimization Group, The School of Mathematics, The King's Buildings
CMAP - Centre de Mathématiques Appliquées - Ecole Polytechnique
Abstract : We give a new link spam detection and PageRank demotion algorithm called MaxRank. Like TrustRank and AntiTrustRank, it starts with a seed of hand-picked trusted and spam pages. We define the MaxRank of a page as the frequency of visit of this page by a random surfer minimizing an average cost per time unit. On a given page, the random surfer selects a set of hyperlinks and clicks with uniform probability on any of these hyperlinks. The cost function penalizes spam pages and hyperlink removals. The goal is to determine a hyperlink deletion policy that minimizes this score. The MaxRank is interpreted as a modified PageRank vector, used to sort web pages instead of the usual PageRank vector. The bias vector of this ergodic control problem, which is unique up to an additive constant, is a measure of the "spamicity" of each page, used to detect spam pages. We give a scalable algorithm for MaxRank computation that allowed us to perform experimental results on the WEBSPAM-UK2007 dataset. We show that our algorithm outperforms both TrustRank and AntiTrustRank for spam and nonspam page detection.
Type de document :
Communication dans un congrès
6th International conference on NETwork Games, COntrol and OPtimization (Netgcoop), Nov 2012, Avignon, France, France. 2012
Liste complète des métadonnées

https://hal.inria.fr/hal-00782844
Contributeur : Canimogy Cogoulane <>
Soumis le : mercredi 30 janvier 2013 - 16:44:06
Dernière modification le : jeudi 10 mai 2018 - 02:05:48

Identifiants

  • HAL Id : hal-00782844, version 1

Collections

Citation

Olivier Fercoq. PageRank optimization applied to spam detection. 6th International conference on NETwork Games, COntrol and OPtimization (Netgcoop), Nov 2012, Avignon, France, France. 2012. 〈hal-00782844〉

Partager

Métriques

Consultations de la notice

200