Privacy Policy Annotation for Semi-automated Analysis: A Cost-Effective Approach

Abstract : Privacy policies go largely unread as they are not standardized, often written in jargon, and frequently long. Several attempts have been made to simplify and improve readability with varying degrees of success. This paper looks at keyword extraction, comparing human extraction to natural language algorithms as a first step in building a taxonomy for creating an ontology (a key tool in improving access and usability of privacy policies).In this paper, we present two alternatives to using costly domain experts are used to perform keyword extraction: trained participants (non-domain experts) read and extracted keywords from online privacy policies; and second, supervised and unsupervised learning algorithms extracted keywords. Results show that supervised learning algorithm outperform unsupervised learning algorithms over a large corpus of 631 policies, and that trained participants outperform the algorithms, but at a much higher cost.
Document type :
Conference papers
Complete list of metadatas

Cited literature [38 references]  Display  Hide  Download

https://hal.inria.fr/hal-01855985
Contributor : Hal Ifip <>
Submitted on : Thursday, August 9, 2018 - 10:41:36 AM
Last modification on : Thursday, August 9, 2018 - 10:43:40 AM
Long-term archiving on : Saturday, November 10, 2018 - 12:57:08 PM

File

 Restricted access
To satisfy the distribution rights of the publisher, the document is embargoed until : 2021-01-01

Please log in to resquest access to the document

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

Citation

Dhiren Audich, Rozita Dara, Blair Nonnecke. Privacy Policy Annotation for Semi-automated Analysis: A Cost-Effective Approach. 12th IFIP International Conference on Trust Management (TM), Jul 2018, Toronto, ON, Canada. pp.29-44, ⟨10.1007/978-3-319-95276-5_3⟩. ⟨hal-01855985⟩

Share

Metrics

Record views

157