Scalable Anonymization Algorithms for Large Data Sets

LeFevre, Kristen; DeWitt, David

Scalable Anonymization Algorithms for Large Data Sets

Files

TR1590.pdf (641.79 KB)

Date

2007

Authors

LeFevre, Kristen

DeWitt, David

Type

Technical Report

Publisher

University of Wisconsin-Madison Department of Computer Sciences

Abstract

k-Anonymity is a widely-studied mechanism for protecting identity when distributing non-aggregate personal data. This basic mechanism can also be extended to protect an individual-level sensitive attribute. Numerous algorithms have been developed in recent years for generalizing, clustering, or otherwise manipulating data to satisfy one or more anonymity requirements. However, few have considered large-scale input data sets that do not fit in main memory. This paper proposes two techniques for incorporating (external) scalability into an existing algorithmic framework. The first technique is based on ideas from scalable decision tree construction, and the second technique is based on sampling. In both cases, the resulting algorithms are guaranteed to produce output data that satisfies the given anonymity requirements. We evaluate the performance of each algorithm both analytically and experimentally.

Citation

TR1590

URI

http://digital.library.wisc.edu/1793/60548

Collections

CS Technical Reports

Full item page

Scalable Anonymization Algorithms for Large Data Sets

Files

Date

Authors

Advisors

License

DOI

Type

Journal Title

Journal ISSN

Volume Title

Publisher

Grantor

Abstract

Description

Keywords

Related Material and Data

Citation

Sponsorship

URI

Collections

Endorsement

Review

Supplemented By

Referenced By