Phoenix: Making Data-Intensive Grid Applications Fault-Tolerant

Loading...
Thumbnail Image

Date

Authors

Kola, George
Kosar, Tevfikk
Livny, Miron

Advisors

License

DOI

Type

Technical Report

Journal Title

Journal ISSN

Volume Title

Publisher

University of Wisconsin-Madison Department of Computer Sciences

Grantor

Abstract

A major hurdle facing data intensive grid applications is the appropriate handling of failures that occur in the grid-environment. Implementing the fault-tolerance transparently at the grid-middleware level would make different data intensive applications fault-tolerant without each having to pay a separate cost and reduce the time to grid-based solution for many scientific problems. We analyzed the failures encountered by four real-life production data intensive applications: NCSA image processing pipeline, WCER video processing pipeline, US-CMS pipeline and BMRB BLAST pipeline. Taking the result of the analysis into account, we have designed and implemented Phoenix, a transparent middleware-level fault-tolerance layer that detects failures early, classifies failures into transient and permanent and appropriately handles the transient failures. We applied our fault-tolerance layer to a prototype of the NCSA image processing pipeline and considerably improved the failure handling and report on the insights gained in the process.

Description

Keywords

Related Material and Data

Citation

TR1513

Sponsorship

Endorsement

Review

Supplemented By

Referenced By