Skip to Main content Skip to Navigation
Conference papers

FP-Crawlers: Studying the Resilience of Browser Fingerprinting to Block Crawlers

Abstract : Data available on the Web, such as financial data or public reviews, provides a competitive advantage to companies able to exploit them. Web crawlers, a category of bot, aim at automating the collection of publicly available Web data. While some crawlers collect data with the agreement of the websites being crawled, most crawlers do not respect the terms of service. CAPTCHAs and approaches based on analyzing series of HTTP requests classify users as humans or bots. However, these approaches require either user interaction or a significant volume of data before they can classify the traffic. In this paper, we study browser fingerprinting as a crawler detection mechanism. We crawled the Alexa top 10K and identified 291 websites that block crawlers. We show that fingerprinting is used by 93 (31.96%) of them and we report on the crawler detection techniques implemented by the major fingerprinters. Finally, we evaluate the resilience of fingerprinting against crawlers trying to conceal themselves. We show that although fingerprinting is good at detecting crawlers, it can be bypassed with little effort by an adversary with knowledge on the fingerprints collected.
Document type :
Conference papers
Complete list of metadatas

Cited literature [73 references]  Display  Hide  Download

https://hal.inria.fr/hal-02441653
Contributor : Romain Rouvoy <>
Submitted on : Thursday, January 16, 2020 - 6:37:29 PM
Last modification on : Friday, November 27, 2020 - 2:20:11 PM
Long-term archiving on: : Friday, April 17, 2020 - 8:30:35 PM

File

vastel-madweb20.pdf
Files produced by the author(s)

Identifiers

Citation

Antoine Vastel, Walter Rudametkin, Romain Rouvoy, Xavier Blanc. FP-Crawlers: Studying the Resilience of Browser Fingerprinting to Block Crawlers. MADWeb'20 - NDSS Workshop on Measurements, Attacks, and Defenses for the Web, Feb 2020, San Diego, United States. ⟨10.14722/ndss.2020.23xxx⟩. ⟨hal-02441653⟩

Share

Metrics

Record views

1085

Files downloads

2605