A Scalable Failure Recovery Model for Tree-based Overlay Networks
Loading...
Files
Date
Authors
Arnold, Dorian
Miller, Barton P.
Advisors
License
DOI
Type
Technical Report
Journal Title
Journal ISSN
Volume Title
Publisher
University of Wisconsin-Madison Department of Computer Sciences
Grantor
Abstract
We present a scalable failure recovery model for data aggregations in large
scale tree-based overlay networks (TBONs). A TBON is a network of
hierarchically organized processes that exploits the logarithmic scaling
properties of trees to provide scalable data multicast, gather, and in-network
aggregation. TBONs are commonly used in debugging and performance tools,
system monitoring, information management systems, stream processing, and mobile
ad hoc networks. Our recovery model leverages inherent information redundancies
in TBON computations. This redundant information is gathered from non-failed
processes to compensate for computation and communication state lost due to
failures. This state compensation strategy is attractive because: (1) it avoids
the time and resource overheads of previous reliability approaches, which rely
on explicit replication; (2) recovery is rapid and only involves a small subset
of the network; and (3) it applies to many useful, complex computations. In this
paper, we formalize the TBON model and its fundamental properties to prove
that our state compensation model properly preserves computational semantics
across TBON process failures. These properties lead to an efficient
implementation of state compensation, which we use to empirically validate and
evaluate recovery performance. We show that state compensation can recover from
failures in extremely large TBONs in milliseconds rendering practically no
application service interruption.
Description
Keywords
Related Material and Data
Citation
TR1626