A Scalable Failure Recovery Model for Tree-based Overlay Networks

Loading...
Thumbnail Image

Date

Authors

Arnold, Dorian
Miller, Barton P.

Advisors

License

DOI

Type

Technical Report

Journal Title

Journal ISSN

Volume Title

Publisher

University of Wisconsin-Madison Department of Computer Sciences

Grantor

Abstract

We present a scalable failure recovery model for data aggregations in large scale tree-based overlay networks (TBONs). A TBON is a network of hierarchically organized processes that exploits the logarithmic scaling properties of trees to provide scalable data multicast, gather, and in-network aggregation. TBONs are commonly used in debugging and performance tools, system monitoring, information management systems, stream processing, and mobile ad hoc networks. Our recovery model leverages inherent information redundancies in TBON computations. This redundant information is gathered from non-failed processes to compensate for computation and communication state lost due to failures. This state compensation strategy is attractive because: (1) it avoids the time and resource overheads of previous reliability approaches, which rely on explicit replication; (2) recovery is rapid and only involves a small subset of the network; and (3) it applies to many useful, complex computations. In this paper, we formalize the TBON model and its fundamental properties to prove that our state compensation model properly preserves computational semantics across TBON process failures. These properties lead to an efficient implementation of state compensation, which we use to empirically validate and evaluate recovery performance. We show that state compensation can recover from failures in extremely large TBONs in milliseconds rendering practically no application service interruption.

Description

Keywords

Related Material and Data

Citation

TR1626

Sponsorship

Endorsement

Review

Supplemented By

Referenced By