Clarinet: WAN-Aware Optimization for Analytics Queries

Loading...
Thumbnail Image

Authors

Viswanathan, Raajay
Ananthanarayanan, Ganesh
Akella, Aditya

Advisors

License

DOI

Type

Technical Report

Journal Title

Journal ISSN

Volume Title

Publisher

Grantor

Abstract

Recent work has made the case for geo-distributed analytics, where data collected and stored at multiple datacenters and edge sites world-wide is analyzed in situ to drive operational and management decisions. A key issue in such systems is ensuring low response times for analytics queries issued against geo-distributed data. A central determinant of response time is the query execution plan (QEP). Current query optimizers do not consider the network when deriving QEPs, which is a key drawback as the geo-distributed sites are connected via WAN links with heterogeneous and modest bandwidths, unlike intra-datacenter networks. We propose Clarinet, a novel WAN-aware query optimizer. Deriving a WAN-aware QEP requires working jointly with the execution layer of analytics frameworks that places tasks to sites and performs scheduling. We design efficient heuristic solutions in Clarinet to make such a joint decision on the QEP. Our experiments with a real prototype deployed across EC2 datacenters, and large-scale simulations using production workloads show that Clarinet improves query response times by greater than 50% compared to state-of-the-art WAN-aware task placement and scheduling.

Description

Related Material and Data

Citation

TR1841

Sponsorship

Endorsement

Review

Supplemented By

Referenced By