TY - GEN
T1 - On the anonymization of sparse high-dimensional data
AU - Ghinita, Gabriel
AU - Tao, Yufei
AU - Kalnis, Panos
PY - 2008
Y1 - 2008
N2 - Existing research on privacy-preserving data publishing focuses on relational data: in this context, the objective is to enforce privacy-preserving paradigms, such as kanonymity and ℓ-diversity, while minimizing the information loss incurred in the anonyrnizing process (i.e. maximize data utility). However, existing techniques adopt an indexing- or clustering-based approach, and work well for fixed-schema data, with low dimensionality. Nevertheless, certain applications require privacy-preserving publishing of transaction data (or basket data), which involves hundreds or even thousands of dimensions, rendering existing methods unusable. We propose a novel anonymization method for sparse high-dlmensional data. We employ a particular representation that captures the correlation in the underlying data, and facilitates the formation of anonymized groups with low information loss. We propose an efficient anonymization algorithm based on this representation. We show experimentally, using real-life datasets, that our method clearly outperforms existing state-of-the-art in terms of both data utility and computational overhead.
AB - Existing research on privacy-preserving data publishing focuses on relational data: in this context, the objective is to enforce privacy-preserving paradigms, such as kanonymity and ℓ-diversity, while minimizing the information loss incurred in the anonyrnizing process (i.e. maximize data utility). However, existing techniques adopt an indexing- or clustering-based approach, and work well for fixed-schema data, with low dimensionality. Nevertheless, certain applications require privacy-preserving publishing of transaction data (or basket data), which involves hundreds or even thousands of dimensions, rendering existing methods unusable. We propose a novel anonymization method for sparse high-dlmensional data. We employ a particular representation that captures the correlation in the underlying data, and facilitates the formation of anonymized groups with low information loss. We propose an efficient anonymization algorithm based on this representation. We show experimentally, using real-life datasets, that our method clearly outperforms existing state-of-the-art in terms of both data utility and computational overhead.
UR - http://www.scopus.com/inward/record.url?scp=52649106883&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2008.4497480
DO - 10.1109/ICDE.2008.4497480
M3 - Conference contribution
AN - SCOPUS:52649106883
SN - 9781424418374
T3 - Proceedings - International Conference on Data Engineering
SP - 715
EP - 724
BT - Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08
T2 - 2008 IEEE 24th International Conference on Data Engineering, ICDE'08
Y2 - 7 April 2008 through 12 April 2008
ER -