Load balancing and Failover systems
In the previous posts on high availability architecture, we have already talked about scaling databases and Content Delivery Networks. Many a times, we have talked about evenly distributing the requests to the different nodes, and also about how to avoid downtime when some node or component fails. In this post, the prime objective is to talk about these processes of load balancing and failover systems in details.
Load balancing is a technique of distributing your requests over a network when your server is maxing out the CPU or disk or database IO rate. The objective of load balancing is optimizing resource use and minimizing response time, thereby avoiding overburden of any one of the resources.
The goal of failover is the ability to continue the work of a particular network component or the whole server, by another, should the first one fail. Failover allows you to perform maintenance of individual servers or nodes, without any interruption of your services.
It is important to note that load balancing and failover systems may not be the same, but they go hand in hand in helping you achieve high availability.
Implementing Load Balancing:
Although the idea of load balancing is very clear, its implementation is not. In this post, I would touch upon the basic ideas of implementation of load balancing. Load balancing can be performed with the help of hardware and software and sometimes a combination of both.
Source: Networks and Servers
The simplest way of load balancing is to use different servers for different services. For instance, you could run the web server on one instance, the database server on another and serve static content through a CDN. It’s easy because there is no issue of data replication.
A second way to perform load balancing is to have multiple front end servers. That would mean that multiple IP addresses would be setup for the same domain. When a client sends a request, a random IP address is given to him, spreading the load around.
Yet another way of implementing load balancing is by using a single Virtual IP, which is provided to all clients. The server at the Virtual IP then re-routes the requests to the real servers. Typically, HAProxy is used to perform this task of load balancing, sitting in front of all servers. HAProxy detects which server is up or down and sends requests according to that.
HackerEarth recently came up with a post on how they scaled their database using HAProxy to manage over 1000 requests per second at peak times. The post effectively explains how their database was sharded and how they managed the requests using Python. A sample HAProxy configuration file has also been provided.
Since failover involves the systems going down (or failing) completely, the data needs to be present at all servers, or in other words, there is a need for data replication. In Unix based systems, file systems can be synced using rsync and cron jobs, whereas for databases, you need to set up replication systems like MySQL replication.
Failover typically involves two servers- a primary and a secondary server. The primary takes the normal load, processing requests, while the secondary monitors the primary and waits for it to shut down in order to take over the services.
Source: Networks and Servers
For the process to take place successfully, you need to detect the failure in a system and hence, route the request to a new system. This failover can be triggered by changing the IP address that your domain points to. However, IP address changes take a few minutes to be implemented.
We hope that this post helped you understand the basics of load balancers and failover systems and served as an important step for you in implementing these techniques to your product.