Scalable Anonymization Algorithms for Large Data Sets

Loading...
Thumbnail Image

Date

Authors

LeFevre, Kristen
DeWitt, David

Advisors

License

DOI

Type

Technical Report

Journal Title

Journal ISSN

Volume Title

Publisher

University of Wisconsin-Madison Department of Computer Sciences

Grantor

Abstract

k-Anonymity is a widely-studied mechanism for protecting identity when distributing non-aggregate personal data. This basic mechanism can also be extended to protect an individual-level sensitive attribute. Numerous algorithms have been developed in recent years for generalizing, clustering, or otherwise manipulating data to satisfy one or more anonymity requirements. However, few have considered large-scale input data sets that do not fit in main memory. This paper proposes two techniques for incorporating (external) scalability into an existing algorithmic framework. The first technique is based on ideas from scalable decision tree construction, and the second technique is based on sampling. In both cases, the resulting algorithms are guaranteed to produce output data that satisfies the given anonymity requirements. We evaluate the performance of each algorithm both analytically and experimentally.

Description

Keywords

Related Material and Data

Citation

TR1590

Sponsorship

Endorsement

Review

Supplemented By

Referenced By