RDBMS Index Support for Sparse Data Sets
Loading...
Files
Date
Authors
Beckmann, Jennifer
Chu, Eric
Naughton, Jeffrey
Advisors
License
DOI
Type
Technical Report
Journal Title
Journal ISSN
Volume Title
Publisher
University of Wisconsin-Madison Department of Computer Sciences
Grantor
Abstract
Maintenance costs and storage overheads incurred by indexes often limit the number of indexes created per table in an RDBMS. For sparse data, where a table may have hundreds of attributes, indexing only a few attributes means that a vanishingly small percentage of attributes will have indexes, which unfortunately means that a table scan is the only evaluation plan for almost all selection queries on that table. This paper demonstrates that sparsity of the data actually enables index support for most, if not all, attributes in the data. Our approach leverages "sparse indexes", which are partial indexes that store only non-null values. Sparse indexes incur low maintenance costs and storage overheads because most values in a sparse table are null. Properties of the data lead us to two other contributions toward index support for sparse data; we show that sparse indexes benefit greatly from building all indexes in one-pass of the data; and we identify that multi-column sparse indexes are preferable as covering indexes when attributes in the data are correlated. We qualitatively evaluate our approaches with synthetic and real-world data to show that our suggestions significantly out-perform traditional indexing approaches designed for dense data.
Description
Keywords
Related Material and Data
Citation
TR1566