Scalable Anonymization Algorithms for Large Data Sets

dc.contributor.author	LeFevre, Kristen	en_US
dc.contributor.author	DeWitt, David	en_US
dc.date.accessioned	2012-03-15T17:21:29Z
dc.date.available	2012-03-15T17:21:29Z
dc.date.created	2007	en_US
dc.date.issued	2007	en_US
dc.description.abstract	k-Anonymity is a widely-studied mechanism for protecting identity when distributing non-aggregate personal data. This basic mechanism can also be extended to protect an individual-level sensitive attribute. Numerous algorithms have been developed in recent years for generalizing, clustering, or otherwise manipulating data to satisfy one or more anonymity requirements. However, few have considered large-scale input data sets that do not fit in main memory. This paper proposes two techniques for incorporating (external) scalability into an existing algorithmic framework. The first technique is based on ideas from scalable decision tree construction, and the second technique is based on sampling. In both cases, the resulting algorithms are guaranteed to produce output data that satisfies the given anonymity requirements. We evaluate the performance of each algorithm both analytically and experimentally.	en_US
dc.format.mimetype	application/pdf	en_US
dc.identifier.citation	TR1590	en_US
dc.identifier.uri	http://digital.library.wisc.edu/1793/60548
dc.publisher	University of Wisconsin-Madison Department of Computer Sciences	en_US
dc.title	Scalable Anonymization Algorithms for Large Data Sets	en_US
dc.type	Technical Report	en_US

Files

Now showing 1 - 1 of 1