RDBMS Index Support for Sparse Data Sets

Beckmann, Jennifer; Chu, Eric; Naughton, Jeffrey

RDBMS Index Support for Sparse Data Sets

Files

TR1566.pdf (2.1 MB)

Date

2006

Authors

Beckmann, Jennifer

Chu, Eric

Naughton, Jeffrey

Type

Technical Report

Publisher

University of Wisconsin-Madison Department of Computer Sciences

Abstract

Maintenance costs and storage overheads incurred by indexes often limit the number of indexes created per table in an RDBMS. For sparse data, where a table may have hundreds of attributes, indexing only a few attributes means that a vanishingly small percentage of attributes will have indexes, which unfortunately means that a table scan is the only evaluation plan for almost all selection queries on that table. This paper demonstrates that sparsity of the data actually enables index support for most, if not all, attributes in the data. Our approach leverages "sparse indexes", which are partial indexes that store only non-null values. Sparse indexes incur low maintenance costs and storage overheads because most values in a sparse table are null. Properties of the data lead us to two other contributions toward index support for sparse data; we show that sparse indexes benefit greatly from building all indexes in one-pass of the data; and we identify that multi-column sparse indexes are preferable as covering indexes when attributes in the data are correlated. We qualitatively evaluate our approaches with synthetic and real-world data to show that our suggestions significantly out-perform traditional indexing approaches designed for dense data.

Citation

TR1566

URI

http://digital.library.wisc.edu/1793/60506

Collections

CS Technical Reports

Full item page

RDBMS Index Support for Sparse Data Sets

Files

Date

Authors

Advisors

License

DOI

Type

Journal Title

Journal ISSN

Volume Title

Publisher

Grantor

Abstract

Description

Keywords

Related Material and Data

Citation

Sponsorship

URI

Collections

Endorsement

Review

Supplemented By

Referenced By