Interacting with Large Distributed Datasets Using Sketch

Loading...
Thumbnail Image

Authors

Budiu, Mihai
Isaacs, Rebecca
Murray, Derek
Plotkin, Gordon
Barham, Paul
Al-Kiswany, Samer
Boshmaf, Yazan
Luo, Qingzhou
Andoni, Alexandr

Advisors

License

DOI

Type

Technical Report

Journal Title

Journal ISSN

Volume Title

Publisher

Grantor

Abstract

We present Sketch, a distributed software infrastructure for building interactive tools for exploring large datasets, distributed across multiple machines. We have built three sophisticated applications using this framework: a billion-row spreadsheet, a distributed log browser, and a distributed- systems performance debugging tool. Sketch applications allow interactive and responsive exploration of complex distributed datasets, scaling gracefully to large system sizes. The conflicting constraints of large-scale data and small timescales required by human interaction are difficult to satisfy simultaneously. Sketch exploits a sweet spot in this trade-off by exploiting the observation that the precision of a data view is limited by the resolution of the user?s screen. The system pushes data reduction operations to the data sources. The core Sketch abstraction provides a narrow programming interface; Sketch clients construct a distributed application by stacking modular components with identical interfaces, each providing a useful feature: network transparency, concurrency, fault-tolerance, straggler avoidance, round-trip reduction, distributed aggregation.

Description

Related Material and Data

Citation

TR1817

Sponsorship

Endorsement

Review

Supplemented By

Referenced By