What is high availability?

The main idea behind high availability is to identify single point of failure in the system and eliminate it. By single point of failure I mean any component that can cause a service interruption if it gets unavailable.

Redundancy

We eliminate single point of failure through redundancy. I.e. we scale the system horizontally by adding more nodes. Each added node increases the overall capacity of the system. In case one node fails the load will be distributed among the remaining ones. Often it means that the pressure the remaining nodes will increase.

Contrary if one node is removed from the system, the load will be distributed across remaining nodes. The important point - system should not fail if one of the nodes is removed. This attribute is typically referenced to as redundancy.

We organize nodes into clusters to simplify system health monitoring. In a case of heavy load due to spike in traffic we can add more nodes to the cluster. Those newly added nodes can take over some of the work and alleviate the pressure on the existing nodes.

Example

Lets consider a simple scenario with an application consisting of a web server connected and a database. Without redundancy the entire system will be down if server crashes. To mitigate the risk we can add several more servers and put a load balancer in front of them. Load balancer can distribute the incoming requests between the server.

Failure of a single server wouldn’t cause the entire system go down. Although it will increase the pressure on the remaining live servers.

With this setup though, the load balancer becomes a single point of failure. To mitigate this risk we can add another load balancer. Virtual IP addresses will help during DNS resolution. Domain will always resolve to the same IP, but the IP will be reassigned between load balancers depending on which one is available. When the first load balancer goes down, the second one will be able to pick up the load and distribute the requests.

And once again, at this point you can say that DNS itself becomes a single point of failure, so we need to prepare for that.

Overall the pattern should be pretty clear, by solving one problem we introduce another one in different component. This is a common situation when talking about system design.

Load Balancing

Load balancer sits in front of the nodes and serves as an entry point into the cluster. Clients access load balancer, which forwards requests to one of the servers. Server then processes request and responds directly to the client.

Load balancer helps with solving two problems:

  1. Distributing the load evenly between the servers
  2. Making sure that client requests are not sent to server which is down

They can route requests depending on the defined rules and strategies, such as:

  • Round-robin - requests are assigned to each server in equal numbers and in circular order
  • Least connections - requests are assigned to server with the fewest active connections
  • Least response time - requests are assigned to server with the fewest active connections and the lowest average response time
  • Hashing - requests are assigned to server based on hashes of certain connection information like IP address or HTTP header. This allows clients consistently connect to the same server. Accessing the same server can also be achieved with sticky sessions. Although it may be useful in certain situation, often it indicates that you have a bigger problem somwhere else in your system. Nowadays both of those techniques are considered as anti-patterns.
Load distribution without load balancer
Load distribution without load balancer
Load distribution with load balancer
Load distribution with load balancer

Health Checks

Load balancer constantly monitors the pull of servers by sending health check requests. Health checks regularly attempt to connect to servers using the protocol and port defined by the forwarding rules to ensure that servers are listening.

Redundant Load Balancers

As discussed before, single load balancer can become a single point of failure. To mitigate this risk another load balancer can be added. In such setup they monitor health of each other. If one of them fails, DNS must take users to the second one.

Replication

Replicas are just the copies of the data distributed across multiple nodes. It allows you to offload the read traffic from a single node.

Replication can improve the following characteristics of the system:

  • Durability - failure of a single server doesn’t lead to data loss and downtime anymore
  • Availability and resilience - failure of a single node doesn’t lead to the failure of entire system
  • Throughput is increased by spreading the load across multiple nodes
  • Latency is reduced by locating replicas closer to the end users

Those improvements don’t come for free. By introducing replication you’re making a tradeoff between consistency on one hand and latency and availability on the other.

Broadly speaking replication can be achieved synchronously or asynchronously. Synchronous replication offers stronger consistency. Asynchronous replication provides higher availability and performance.

We say that data is consistent if they’re in the same state across every node in the system. Consistency implies the accuracy, completeness, and correctness of data.

Consistency allows us to avoid two well known problems:

  1. “Read-your-own-write” inconsistency
  2. Inconsistent reads from different replicas.

I’ll save discussion of those problems for another post as this one is getting too big already.

Replication strategies

Data replication can be done by using different techniques:

  • Leader-follower replication
  • Multi-leader replication
  • Leaderless replication

In a leader-follower replication scheme there is one node dedicated to accept all the writes. It’s called leader. Other nodes known as followers replicate the data from the leader. Even if the leader node fails, data is still available for reads through follower replicas.

In write heavy applications multi-leader scheme can be used. Multiple nodes can accept writes in such setup. It’s way more complicated as it requires mechanism allowing to resolve conflicts efficiently.

Possible conflict-resolution strategies:

  • Last write wins - the most recent change takes precedence. It’s easy to implement but there is a risk of discarding important changes
  • Conflict-free replicated data types - such structures allow seamless reconciliation of conflicting changes by merging them automatically
  • Operational transformation - operation itself is taken into account, not only the state of the data
  • Application-specific logic - it can involve human intervention for resolving conflicts. Alternatively it’s possible to set the rules within application that decide which version is correct when conflict arises
  • Timestamps or version numbers - data with the most recent timestamp or the highest version number is considered the truth
  • Data partitioning - when data are partitioned across multiple leaders it minimizes chances of conflicts. Such technique requires attention to avoid hot spots on busy data partitions.

Leaderless replication takes a quorum-based approach. In such setup any node in the network can accept write operations. It aims for a consensus among a certain number of nodes which is called “quorum”. The primary benefit of this approach is high availability. It also has the drawbacks related to conflict resolution.

Partitioning

Database partitioning

There are many ways to partition a system. Databases are one obvious candidate for partitioning. It allows to avoid limits on database size, throughput, or number of concurrent sessions.

A database can be partitioned in several ways:

  • Horizontally, also called sharding - each partition holds data for a subset of the total data set.
  • Vertically - each partition holds a subset of the fields for the items in the data store. For example, you can put frequently accessed fields in one partition, and less frequently accessed fields in another. In traditional RDBMS vertical partition involves splitting a database table on columns. An example could be breaking a single Product data table into several different tables like basic information, item availability, item restrictions etc. A good signal that you should vertically partition a table is when you notice lots of queries only requesting a few of the columns at a time.
  • Functionally - data is partitioned according to how it is used by each bounded context in the system. For example, store invoice data in one partition and product inventory data in another. The schemas are independent.

Sharding

At a large scale one node cannot hold the entire data set. You’re required to use sharding. It allows to scale far beyond the constraints of a single traditional database. As a side effect it also makes the application more reliable and resilient to failure.

Sharding is associated with additional complexity, as you need to deal with data spread around all the partitions. The biggest issue is keeping data consistent across all of them. Sharding should always be a last resort when it comes to scaling your database. Other alternatives like read replicas and caching should be implemented first because they are much easier to implement. Important note about sharding, that makes it different from clustering or replication is that shards know about each other.

Partitions share the same data schema. Replication of data is based on some kind of logic or identifier often called shard key. For example, customer name can serve as such key. In this setup customers whose names start with A–M go into one partition, N–Z into another.

Selecting shard key

Distribution techniques

Range To make efficient scans possible, the data can be partitioned into ordered and contiguous value ranges by range-sharding. However, this approach requires some coordination through a master that manages assignments. To ensure elasticity, the system has to be able to detect and resolve hotspots automatically by further splitting an overburdened shard.

Partitioning the data based ont the first letter of client name

Hash In Hash-sharding every data item is assigned to a shard server according to some hash value built from the primary key. The goal of hashing function is to split data evenly.

serverid = hash(id) % servers - this hashing scheme requires all records to be reassigned every time a new server joins or leaves, because it changes with the number of shard servers (servers).

This approach does not require a coordinator and also guarantees the data to be evenly distributed across the shards, as long as the used hash function produces an even distribution.

A downside of this strategy is the need to remap data to hash values when servers are added or removed. For this reason elastic systems use consistent hashing instead.

Entity Group It is a data partitioning scheme with the goal of enabling single-partition transactions on co-located data.

Lookup service This sharding strategy works by implementing a lookup table that sits in front of the sharded databases. The service tracks the current partitioning scheme and maps to the locations of each shard.

Carefully selected partition key allows to avoid hotspots and is crutial when desgning a system. If you partition a database, but one shard still gets the majority of the requests, then you haven’t solved your problem. Ideally, load gets distributed evenly across all the partitions. For example, hash by customer ID and not the first letter of the customer name, because some letters are more frequent. The same principle applies when partitioning a message queue. Pick a partition key that leads to an even distribution of messages across the set of queues.

It’s possible to have a small number of power users who create much, much more data than others. Sometimes it’s reasonable to allocate them a shard of their own. Random assignments are usually easiest, but are by no means the only option. For example users can be assigned to shards randomly - except for user with ID: 367823, who is assigned to shard 15, which no other user is assigned to. Source: Systems design for advanced beginners | Robert Heaton

Beyond database partitioning

Others candidates to partition include file storage, cache, queues, and compute instances. Here partitioning a queue or message bus, for example, allows to avoid limits on the number of requests or the number of concurrent connections.

What can be improved with partitioning?

  • Scalability. When you scale up vertically a single database, it will eventually reach a hardware limit. If you divide data across multiple partitions, each hosted on a separate server, you can scale out the system almost indefinitely.
  • Performance. Data access operations on each partition take place over a smaller volume of data. Correctly done, partitioning can make your system more efficient. Operations that affect more than one partition can run in parallel. Because your data is broken into smaller pieces, queries only have to search smaller amounts of data. This speeds up database performance and response times.
  • Security. You can separate sensitive and nonsensitive data into different partitions and apply different security controls to the sensitive data.
  • Operational flexibility. Partitioning offers many opportunities for fine-tuning operations, maximizing administrative efficiency, and minimizing cost. For example, you can define different strategies for management, monitoring, backup and restore, and other administrative tasks based on the importance of the data in each partition.
  • Matching the data store to the pattern of use. Partitioning allows each partition to be deployed on a different type of data store, based on cost and the built-in features that data store offers. For example, large binary data can be stored in blob storage, while more structured data can be held in a document database.
  • Availability. Separating data across multiple servers avoids a single point of failure. If one instance fails, only the data in that partition is unavailable. Operations on other partitions can continue.

References