Scalable Anonymization Algorithms for Large Data Sets
Loading...
Files
Date
Authors
LeFevre, Kristen
DeWitt, David
Advisors
License
DOI
Type
Technical Report
Journal Title
Journal ISSN
Volume Title
Publisher
University of Wisconsin-Madison Department of Computer Sciences
Grantor
Abstract
k-Anonymity is a widely-studied mechanism for protecting identity
when distributing non-aggregate personal data. This basic mechanism
can also be extended to protect an individual-level sensitive attribute.
Numerous algorithms have been developed in recent years
for generalizing, clustering, or otherwise manipulating data to satisfy
one or more anonymity requirements. However, few have considered
large-scale input data sets that do not fit in main memory.
This paper proposes two techniques for incorporating (external)
scalability into an existing algorithmic framework. The first technique
is based on ideas from scalable decision tree construction,
and the second technique is based on sampling. In both cases, the
resulting algorithms are guaranteed to produce output data that satisfies
the given anonymity requirements. We evaluate the performance
of each algorithm both analytically and experimentally.
Description
Keywords
Related Material and Data
Citation
TR1590