Implementing highly available, highly reliable virtual processors

Macdonald, Robert Noël

Implementing highly available, highly reliable virtual processors

Files

MACDONALD_ROBERT_MSc_1994_676787.pdf (1.98 MB)

Date

1994

Authors

Macdonald, Robert Noël

Abstract

A fault-tolerant distributed facility called a Halt on Failure Processor (HFP) and its performance in a network of workstations are described. Process replication and n-modular redundancy are used to achieve fault tolerance in a general purpose workstation environment. A blacklisting mechanism is used to differentiate between slow and crashed workstations. The system achieves high availability by keeping a list of healthy workstations. The HFP will halt rather than deliver the results from an erroneous calculation to its users. The design of the HFP is presented along with the type and number of errors it is capable of hand ling. The implementation using the existing Remote Execution Manager is discussed. Extensive performance studies were carried out within a network of Sun SPARC workstations running UNIX. Performance results are presented and the costs of performing fault management at various levels are exposed. Flaws in the way UNIX reports load information and their implication on load-balancing are pointed out. It is shown that IIFPs can achieve high availability and fault-tolerance using the idle cycles of workstations in a local area network with little performance degradation.

URI

https://hdl.handle.net/1828/18792

Collections

Electronic Theses and Dissertations (ETD)

Full item page

Implementing highly available, highly reliable virtual processors

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections