Generic timing fault tolerance using a timely computing base

António Casimiro, Paulo Veríssimo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Scopus citations

Abstract

Designing applications with timeliness requirements in environments of uncertain synchrony is known to be a difficult problem. In this paper, we follow the perspective of timing fault tolerance: timing errors occur, and they are processed using redundancy, e.g., component replication, to recover and deliver timely service. We introduce a paradigm for generic timing fault tolerance with replicated state machines. The paradigm is based on the existence of Timing Failure Detection with timed completeness and accuracy properties. Generic timing fault tolerance implies the ability to dependably observe the system and to timely notify timing failures, which we discuss in the paper. On the other hand, it ensures replica determinism with respect to time (temporal consistency), and safety in case of spare exhaustion. We show that the paradigm can be addressed and realized in the framework of the Timely Computing Base (TCB) model and architecture. Furthermore, we illustrate the generality of our approach by reviewing previous existing solutions and by showing that in contrast with ours, they only secure a restricted semantics, or simply provide ad-hoc solutions.
Original languageEnglish (US)
Title of host publicationProceedings of the 2002 International Conference on Dependable Systems and Networks
PublisherIEEE Computer Society
Pages27-36
Number of pages10
ISBN (Print)0769515975
DOIs
StatePublished - Jan 1 2002

Fingerprint

Dive into the research topics of 'Generic timing fault tolerance using a timely computing base'. Together they form a unique fingerprint.

Cite this