APPCorp: a corpus for Android privacy policy document structure analysis

作者:Shuang Liu, Fan Zhang, Baiyang Zhao, Renjie Guo, Tao Chen, Meishan Zhang

摘要

With the increasing popularity of mobile devices and the wide adoption of mobile Apps, an increasing concern of privacy issues is raised. Privacy policy is identified as a proper medium to indicate the legal terms, such as the general data protection regulation (GDPR), and to bind legal agreement between service providers and users. However, privacy policies are usually long and vague for end users to read and understand. It is thus important to be able to automatically analyze the document structures of privacy policies to assist user understanding. In this work we create a manually labelled corpus containing 231 privacy policies (of more than 566,000 words and 7,748 annotated paragraphs). We benchmark our data corpus with 3 document classification models and achieve more than 82% on F1-score.

论文关键词:privacy policy, GDPR, document structure analysis, representation learning, graph neural network

论文评审过程:

论文官网地址:https://doi.org/10.1007/s11704-022-1627-2