Fast data anonymization with low information loss

Gabriel Ghinita, Panagiotis Karras, Panos Kalnis, Nikos Mamoulis

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

238 Scopus citations

Abstract

Recent research studied the problem of publishing microdata without revealing sensitive information, leading to the privacy preserving paradigms of k-anonymity and l-diversity. k-anonymity protects against the identification of an individual's record. l-diversity, in addition, safeguards against the association of an individual with specific sensitive information. However, existing approaches suffer from at least one of the following drawbacks: (i) The information loss metrics are counter-intuitive and fail to capture data inaccuracies inflicted for the sake of privacy. (ii) l-diversity is solved by techniques developed for the simpler k-anonymity problem, which introduces unnecessary inaccuracies. (iii) The anonymization process is inefficient in terms of computation and I/O cost. In this paper we propose a framework for efficient privacy preservation that addresses these deficiencies. First, we focus on one-dimensional (i.e., single attribute) quasi-identifiers, and study the properties of optimal solutions for k-anonymity and l-diversity, based on meaningful information loss metrics. Guided by these properties, we develop efficient heuristics to solve the one-dimensional problems in linear time. Finally, we generalize our solutions to multi-dimensional quasi-identifiers using space-mapping techniques. Extensive experimental evaluation shows that our techniques clearly outperform the state-of-the-art, in terms of execution time and information loss.

Original languageEnglish (US)
Title of host publication33rd International Conference on Very Large Data Bases, VLDB 2007 - Conference Proceedings
EditorsJohannes Gehrke, Christoph Koch, Minos Garofalakis, Karl Aberer, Carl-Christian Kanne, Erich J. Neuhold, Venkatesh Ganti, Wolfgang Klas, Chee-Yong Chan, Divesh Srivastava, Dana Florescu, Anand Deshpande
PublisherAssociation for Computing Machinery, Inc
Pages758-769
Number of pages12
ISBN (Electronic)9781595936493
StatePublished - 2007
Event33rd International Conference on Very Large Data Bases, VLDB 2007 - Vienna, Austria
Duration: Sep 23 2007Sep 27 2007

Publication series

Name33rd International Conference on Very Large Data Bases, VLDB 2007 - Conference Proceedings

Conference

Conference33rd International Conference on Very Large Data Bases, VLDB 2007
Country/TerritoryAustria
CityVienna
Period09/23/0709/27/07

ASJC Scopus subject areas

  • Hardware and Architecture
  • Information Systems and Management
  • Information Systems
  • Software

Fingerprint

Dive into the research topics of 'Fast data anonymization with low information loss'. Together they form a unique fingerprint.

Cite this