大规模手机位置数据研究中的个体重识别风险 及其与数据可用性的关系
大规模手机位置数据研究中的个体重识别风险 及其与数据可用性的关系
DOI:
作者:
作者单位:

作者简介:

通讯作者:

基金项目:

国家自然科学基金项目(41301440);广东省自然科学基金(2014A030313684);深圳市基础研究(JCYJ20140610151856728)

伦理声明:



Re-Identification Risk Versus Data Utility for Aggregated Mobility Research Using Mobile Phone Location Data
Author:
Ethical statement:

Affiliation:

Funding:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    手机位置数据是一种新兴的轨迹数据源,在支持人类移动研究方面具有巨大的潜力。近期研究指出,基于手机用户独特的活动特征,许多用户能够被轻易地重识别。然而,隐私保护处理对原始数据的改变会导致数据可用性的损失。因此,使用详细位置数据进行活动分析的同时避免隐私风险成为一个挑战。本研究旨在揭示中国一个大型城市的手机用户重识别风险,以及将该数据用于人群移动分析时,用户重识别风险和数据可用性之间的量化关系。首先,以深圳市为例,评估全市某一主要运营商手机用户的重识别风险;然后,提出并实现一种空间泛化方法以保护用户隐私;最后,使用人群移动分析为例,评估隐私保护后数据可用性的损失。结果显示,深圳市的重识别风险不同于西方城市,证明了基于手机位置数据的重识别风险具有空间异质性。其次,发现了重识别风险(x)和数据可用性(y)之间的数学关系 y=-axb+c(a, b, c>0; 0<x<1)。该关系的发现,为数据发布者在权衡隐私风险和数据可用性之间的关系时提供了科学依据。本研究有助于更好地理解大规模轨迹数据中的个体重识别风险,以及隐私风险与数据可用性之间的权衡基准,有助于降低共享轨迹数据时的隐私风险。

    Abstract:

    Mobile phone location data is a newly emerging data source of great potential to support human mobility research. However, recent studies have indicated that many users can be easily re-identified based on their unique activity patterns. Privacy protection procedures will usually change the original data and cause a loss of data utility for analysis purposes. Therefore, the need for detailed data for activity analysis while avoiding potential privacy risks presents a challenge. The aim of this study is to reveal the re-identification risks from a Chinese city’s mobile users and to examine the quantitative relationship between re-identification risk and data utility for an aggregated mobility analysis. The first step was to evaluate the re-identification risks in Shenzhen City, a metropolis in China. A spatial generalization approach to protecting privacy was then proposed and implemented, and spatially aggregated analysis was used to assess the loss of data utility after privacy protection. The results demonstrate that the re-identification risks in Shenzhen City are clearly different from those in regions reported in Western countries, which prove the spatial heterogeneity of reidentification risks in mobile phone location data. A uniform mathematical relationship has also been found between re-identification risk (x) and data utility (y) for both attack models: y=-axb+c(a, b, c>0; 0<x<1). The discovered mathematical relationship provides data publishers with useful guidance on choosing the right tradeoff between privacy and utility. Overall, this study contributes to a better understanding of reidentification risks and a privacy-utility tradeoff benchmark for improving privacy protection when sharing detailed trajectory data.

    参考文献
    相似文献
    引证文献
引用本文

引文格式
尹 凌,胡金星,王 倩,汪 伟,蔡芷铃.大规模手机位置数据研究中的个体重识别风险 及其与数据可用性的关系 [J].集成技术,2016,5(2):19-28

Citing format
YIN Ling, HU Jinxing, WANG Qian, WANG Wei, CAI Zhiling. Re-Identification Risk Versus Data Utility for Aggregated Mobility Research Using Mobile Phone Location Data[J]. Journal of Integration Technology,2016,5(2):19-28

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2016-04-01
  • 出版日期: