Fault tolerant computer architecture

Not in Library

My Reading Lists:

Create a new list

Check-In

×Close
Add an optional check-in date. Check-in dates are used to track yearly reading goals.
Today


Buy this book

Last edited by MARC Bot
June 30, 2019 | History

Fault tolerant computer architecture

For many years, most computer architects have pursued one primary goal: performance. Architects have translated the ever-increasing abundance of ever-faster transistors provided by Moore's law into remarkable increases in performance. Recently, however, the bounty provided by Moore's law has been accompanied by several challenges that have arisen as devices have become smaller, including a decrease in dependability due to physical faults. In this book, we focus on the dependability challenge and the fault tolerance solutions that architects are developing to overcome it. The two main purposes of this book are to explore the key ideas in fault-tolerant computer architecture and to present the current state-of-the-art--over approximately the past 10 years--in academia and industry.

Publish Date
Language
English

Buy this book

Previews available in: English

Edition Availability
Cover of: Fault Tolerant Computer Architecture
Fault Tolerant Computer Architecture
2009, Springer Nature
in English
Cover of: Fault tolerant computer architecture
Fault tolerant computer architecture
2009, Morgan & Claypool Publishers
electronic resource / in English

Add another edition?

Book Details


Table of Contents

Introduction
Goals of this book
Faults, errors, and failures
Masking
Duration of faults and errors
Underlying physical phenomena
Trends leading to increased fault rates
Smaller devices and hotter chips
More devices per processor
More complicated designs
Error models
Error type
Error duration
Number of simultaneous errors
Fault tolerance metrics
Availability
Reliability
Mean time to failure
Mean time between failures
Failures in time
Architectural vulnerability factor
The rest of this book
References
Error detection
General concepts
Physical redundancy
Temporal redundancy
Information redundancy
The end-to-end argument
Microprocessor cores
Functional units
Register files
Tightly lockstepped redundant cores
Redundant multithreading without lockstepping
Dynamic verification of invariants
High-level anomaly detection
Using software to detect hardware errors
^
Error detection tailored to specific fault models
Caches and memory
Error code implementation
Beyond EDCs
Detecting errors in content addressable memories
Detecting errors in addressing
Multiprocessor memory systems
Dynamic verification of cache coherence
Dynamic verification of memory consistency
Interconnection networks
Conclusions
References
Error recovery
General concepts
Forward error recovery
Backward error recovery
Comparing the performance of FER and BER
Microprocessor cores
FER for cores
BER for cores
Single-core memory systems
FER for caches and memory
BER for caches and memory
Issues unique to multiprocessors
What state to save for the recovery point
Which algorithm to use for saving the recovery point
Where to save the recovery point
How to restore the recovery point state
Software-implemented BER
Conclusions
References
Diagnosis
General concepts
The benefits of diagnosis
^
^^
System model implications
Built-in self-test
Microprocessor core
Using periodic BIST
Diagnosing during normal execution
Caches and memory
Multiprocessors
Conclusions
References
Self-repair
General concepts
Microprocessor cores
Superscalar cores
Simple cores
Caches and memory
Multiprocessors
Core replacement
Using the scheduler to hide faulty functional units
Sharing resources across cores
Self-repair of noncore components
Conclusions
References
The future
Adoption by industry
Future relationships between fault tolerance and other fields
Power and temperature
Security
Static design verification
Fault vulnerability reduction
Tolerating software bugs
References.
^^

Edition Notes

Part of: Synthesis digital library of engineering and computer science.

Title from PDF t.p. (viewed on June 4, 2009).

Series from website.

Includes bibliographical references.

Abstract freely available; full-text restricted to subscribers or individual document purchasers.

Also available in print.

Mode of access: World Wide Web.

System requirements: Adobe Acrobat reader.

Published in
San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA)
Series
Synthesis lectures on computer architecture -- # 5
Other Titles
Synthesis digital library of engineering and computer science.

Classifications

Dewey Decimal Class
004.2
Library of Congress
QA76.9.F38 S674 2009

The Physical Object

Format
[electronic resource] /

ID Numbers

Open Library
OL27038278M
Internet Archive
faulttolerantcom00sori
ISBN 13
9781598299540, 9781598299533

Community Reviews (0)

Feedback?
No community reviews have been submitted for this work.

Lists

This work does not appear on any lists.

History

Download catalog record: RDF / JSON
June 30, 2019 Created by MARC Bot import new book