Privacypreserving anonymization of set-valued data

Manolis Terrovitis*, Nikos Mamoulis, Panos Kalnis

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

255 Scopus citations


In this paper we study the problem of protecting privacy in the publication of set-valued data. Consider a collection of transactional data that contains detailed information about items bought together by individuals. Even after removing all personal characteristics of the buyer, which can serve as links to his identity, the publication of such data is still subject to privacy attacks from adversaries who have partial knowledge about the set. Unlike most previous works, we do not distinguish data as sensitive and non-sensitive, but we consider them both as potential quasi-identifiers and potential sensitive data, depending on the point of view of the adversary. We define a new version of the k-anonymity guarantee, the km-anonymity, to limit the effects of the data dimensionality and we propose eficient algorithms to transform the database. Our anonymization model relies on generalization instead of suppression, which is the most common practice in related works on such data. We develop an algorithm which finds the optimal solution, however, at a high cost which makes it inapplicable for large, realistic problems. Then, we propose two greedy heuristics, which scale much better and in most of the cases find a solution close to the optimal. The proposed algorithms are experimentally evaluated using real datasets.

Original languageEnglish (US)
Pages (from-to)115-125
Number of pages11
JournalProceedings of the VLDB Endowment
Issue number1
StatePublished - 2008

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science(all)


Dive into the research topics of 'Privacypreserving anonymization of set-valued data'. Together they form a unique fingerprint.

Cite this