Comparisons among distributed election algorithms(Bully, Raft and ZAB)

Introduction

We often hear concepts such as database clusters , and also know that database clusters provide read and write functions. So how to make hundreds or thousands nodes cooperate together? For example, in a database cluster, how to ensure the consistency of the written data on each node? A common approach is to use Leader to schedule and manage other nodes. Leader election is a form of consensus.

Why distributed elections?

The master node is responsible for the coordination and management of other nodes in a distributed cluster, that is to say, all other nodes must follow the instructions of the master node.

The presence of the master node can ensure the correct operation of other nodes and the consistency of the written data in the cluster on each node. Consistency means that the data is the same in every node in the cluster.

Distributed election algorithm

Leader-based algorithm is a type of asymmetric approach to reach consensus. At any given time, one server (master) is in charge, and others accept its decisions. Clients only communicate with the leader. Another approach is symmetric, which is leader-less, where all servers have equal roles and clients can contact any servers. But today, I’m only going to cover asymmetric approaches, or leader-based algorithm including: Bully, Raft, and ZAB.

Bully

Like its name, bully algorithm is a very “bully” way to select a leader: The active node with highest ID number is selected as the leader.

The algorithm uses the following three message types:

  • Election Message: Sent to announce election.
  • Answer (Alive) Message: Responds to the Election message.
  • Coordinator (Victory) Message: Sent by winner of the election to announce victory.

The algorithm assumes that each node knows its own process id and address, and that of every other node. The election process is:

MongoDb’s election mechanism adopts the Bully algorithm, which makes it easy to select the master node from the distributed nodes.

Pros: The election is fast, the algorithm complexity is low, and it is simple and easy to implement.

Cons: Each node needs to store global nodes information, so additional space is required to store these data. Secondly, when any node with a larger ID than the current master node joins or resumes to join the cluster after a node failure, re-election may be triggered. Nodes frequently exit and join the cluster, and the system will constantly switch the master and slaves.

Raft

Raft algorithm is like presidential election.

At any given time, each server is either:

  • Leader: handles all client interactions, log replication. (At most 1 viable leader at a time)
  • Follower: completely passive (issues no RPCs, responds to incoming RPCs)
  • Candidate: used to elect a new leader

Normal operation: 1 leader, N-1 followers:

  1. When initialized, all nodes are in Follower state.
  2. When selecting the master, the status of all nodes changes from Follower to Candidate and sends election requests to other nodes.
  3. Each node makes a decision whether to agree to this vote. In each round of election, a node can only cast one vote.
  4. If the node that started the election gets majority votes(more than half), it becomes the master node, and its status is changed to Leader, and the status of other Candidate nodes is changed to Follower. Heartbeat packets are periodically sent between the Leader node and the Follower node to detect whether the master node is alive.
  5. When the term of the Leader node is up, that is, it is found that other servers start the next round of the main election cycle, the current leader steps down, and a new round of main election is stated.

Etcd, open-sourced by google, implements Raft consensus algorithm for data replication.

Pros: Efficient in election, easy to implement. The election stability of this algorithm is better than the Bully algorithm. It is because when a new node is added or a node is restored from failure, the election will be triggered, but only when the node gets majority votes will the it switch to a leader.

Cons: During the election, a candidate need to get more than half of the votes to successfully be selected as the master, so RPC requests between nodes is large.

ZAB

Zookeeper Atomic Broadcast (ZAB) is the protocol under the hood that drives ZooKeeper replication order guarantee. Compared with the Raft algorithm, the ZAB algorithm ensures data up-to-date as much as possible. The node with the latest data has a higher priority to be selected as the leader.

Why odd number of nodes

This is because in an election it requires a majority of members to be up to elect a primary.

+-------------------+------------------------------------------+-----------------+
| Number of Members | Majority Required to Elect a New Primary | Fault Tolerance |
+-------------------+------------------------------------------+-----------------+
|         3         |                    2                     |        1        |
+-------------------+------------------------------------------+-----------------+
|         4         |                    3                     |        1        |
+-------------------+------------------------------------------+-----------------+
|         5         |                    3                     |        2        |
+-------------------+------------------------------------------+-----------------+
|         6         |                    4                     |        2        |
+-------------------+------------------------------------------+-----------------+

This table is from MongoDB document, you can find that when you have an odd number of members and add one more (to become even), your fault tolerance does not go up! 

References and Reading Material:

Author: huadonghu

Leave a Reply

Your email address will not be published. Required fields are marked *