Fault Tolerance

Fault Tolerance is the ability of any system, platform, or environment to handle the failure of one or more of its constituent components. The degree of resiliency depends on addressing any single point of failure.

Hardware

At the hardware layer, tolerance is achieved through redundancy in power, compute, network, and storage components. This can also include completely redundant environments located in different physical zones (groups of data centers) or regions (groups of zones).

Software

At the software layer, tolerance is achieved through distribution and segregation. Handle workloads for user interaction, data processing, and data management in separate independent operations. This reduces the single points of failure as different hardware can be assigned to process different parts of the software.

Data

At the data layer, tolerance is achieved when any single read or write is always completed. Data remains consistently available and can be distributed across multiple hardware environments.

Was this article helpful?