Database Design

High Availability and Fault Tolerance

0

Having a highly available and fault tolerant cluster is a critical component of a production environment. Vertica utilizes a concept called K-safety to measure fault tolerance. High availability comes from having at least three nodes and replicating small tables or creating buddy projections for larger tables. This post will discuss and illustrate these concepts.

In order to have high availability, there must be at least 3 nodes in the cluster. By default, a cluster with three or more nodes will have a K-safety of 1. It’s important to understand how projections are organized before discussing how fault tolerance is handled. The following short video illustrates how data is segmented or replicated on a three node cluster.

High Availability

When data is segmented, it will have a node offset of one by default. This means that the buddy projection of a segment will be offset by one node. As long as the data segment is available on a functioning cluster node, the database can continue to run (unless more than half the nodes fail). In the example above, if one of the three nodes fail, the database could continue to run:

High Availability Illustration 01

However, in this state, further node failures would cause the database to shut down as the failed nodes’ data is not available in any other functioning node in the cluster. With a K-safety level of 1, up to half the nodes in the database could fail without causing the database to shut down. Even if the data is available from another active node, failure of more than half the nodes in the cluster will result in a database shut down:

High Availability Illustration 02

Note that not all data is replicated as it could potentially have an expensive storage cost. Large tables (i.e. fact) are typically segmented using a built-in hash function. Segmenting the projection evenly distributes data across the nodes resulting in optimal query execution. This segmentation is also critical when scaling a cluster up or down.

K-Safety

The “K” value of a K-safety level represents the number of replicas of the data present in the cluster. If a node were to fail, other nodes will contain a replica of this data allowing the database to continue to run. However, the database will shutdown if more than half the nodes fail even if all of the data is available from replicas.

As illustrated in the short video above, each segment is offset and copied to another node in what is called a Buddy Projection. The buddy projection will contain the same columns and have the same hash segmentation using different node ordering. Additionally, they can have different offsets and sort orders. When data is loaded, it is loaded into both projections.

Documentation

About the author / 

Norbert Krupa

Norbert is the founder of vertica.tips and a Solutions Engineer at Talend. He is an HP Accredited Solutions Expert for Vertica Big Data Solutions. He has written the Vertica Diagnostic Queries which aim to cover monitoring, diagnostics and performance tuning. The views, opinions, and thoughts expressed here do not represent those of the user's employer.

Leave a Reply

Upcoming Events

  • No upcoming events
AEC v1.0.4

Subscribe to Blog via Email

Enter your email address to subscribe and receive notifications of new posts by email.

Read more use cases here.

Notice

This site is not affiliated, endorsed or associated with HPE Vertica. This site makes no claims on ownership of trademark rights. The author contributions on this site are licensed under CC BY-SA 3.0 with attribution required.
%d bloggers like this: