Supporting nondeterministic execution in fault-tolerant systems

J. Hamilton Slye*, E. N. Elnozahy

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

27 Scopus citations

Abstract

We present a technique to track nondeterminism resulting from asynchronous events and multithreading in log-based rollback-recovery protocols. This technique relies on using a software counter to compute the number of instructions between nondeterministic events in normal operation. Should a failure occur, the instruction counts are used to force the replant of these events at the same execution points. The execution of the application thus can be replayed to recreate the pre-failure state, while accommodating uncontrolled nondeterminism during normal operation. Implementation on a DEC Alpha processor shows that this support has a low overhead, typically less than 6% increase in running time for the applications we studied.

Original languageEnglish (US)
Pages (from-to)250-259
Number of pages10
JournalProceedings - Annual International Conference on Fault-Tolerant Computing
StatePublished - 1996
Externally publishedYes
EventProceedings of the 1996 26th International Symposium on Fault-Tolerant Computing - Sendai, Jpn
Duration: Jun 25 1996Jun 27 1996

ASJC Scopus subject areas

  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Supporting nondeterministic execution in fault-tolerant systems'. Together they form a unique fingerprint.

Cite this