Skip to Main content Skip to Navigation
Conference papers

Privacy Policy Annotation for Semi-automated Analysis: A Cost-Effective Approach

Abstract : Privacy policies go largely unread as they are not standardized, often written in jargon, and frequently long. Several attempts have been made to simplify and improve readability with varying degrees of success. This paper looks at keyword extraction, comparing human extraction to natural language algorithms as a first step in building a taxonomy for creating an ontology (a key tool in improving access and usability of privacy policies).In this paper, we present two alternatives to using costly domain experts are used to perform keyword extraction: trained participants (non-domain experts) read and extracted keywords from online privacy policies; and second, supervised and unsupervised learning algorithms extracted keywords. Results show that supervised learning algorithm outperform unsupervised learning algorithms over a large corpus of 631 policies, and that trained participants outperform the algorithms, but at a much higher cost.
Document type :
Conference papers
Complete list of metadata

Cited literature [38 references]  Display  Hide  Download
Contributor : Hal Ifip <>
Submitted on : Thursday, August 9, 2018 - 10:41:36 AM
Last modification on : Thursday, August 9, 2018 - 10:43:40 AM
Long-term archiving on: : Saturday, November 10, 2018 - 12:57:08 PM


Files produced by the author(s)


Distributed under a Creative Commons Attribution 4.0 International License



Dhiren Audich, Rozita Dara, Blair Nonnecke. Privacy Policy Annotation for Semi-automated Analysis: A Cost-Effective Approach. 12th IFIP International Conference on Trust Management (TM), Jul 2018, Toronto, ON, Canada. pp.29-44, ⟨10.1007/978-3-319-95276-5_3⟩. ⟨hal-01855985⟩



Record views


Files downloads