On the Uniqueness of Web Browsing History Patterns

* Corresponding author
1 PRIVATICS - Privacy Models, Architectures and Tools for the Information Society
Inria Grenoble - Rhône-Alpes, CITI - CITI Centre of Innovation in Telecommunications and Integration of services, Inria Lyon
Abstract : We present the results of the first large-scale study of the uniqueness of Web browsing histories, gathered from a total of $368,284$ Internet users who visited a history detection demonstration website. Our results show that for a majority of users ($69\%$), the browsing history is unique and that users for whom we could detect at least $4$ visited websites were uniquely identified by their histories in $97\%$ of cases. We observe a significant rate of stability in browser history fingerprints: for repeat visitors, $38\%$ of fingerprints are identical over time, and differing ones were correlated with original history contents, indicating static browsing preferences (for history subvectors of size $50$). We report a striking result that it is enough to test for a small number of pages in order to both enumerate users' interests and perform an efficient and unique behavioral fingerprint; we show that testing $50$ web pages is enough to fingerprint $42\%$ of users in our database, increasing to $70\%$ with $500$ web pages.
Document type :
Journal articles
Lukasz Olejnik, Claude Castelluccia, Artur Janc. On the Uniqueness of Web Browsing History Patterns. Annals of Telecommunications - annales des télécommunications, 2013, ⟨10.1007/s12243-013-0392-5⟩. ⟨hal-00917042⟩

