We have several goals when designing Raft algorithm: we must provide a complete and practical algorithm foundation for system construction, thus significantly reducing the design work required by developers; It must be safe under all conditions and available under typical operating conditions; It must be efficient for routine operation. But our most important goal and the most difficult challenge is the understandability of Raft algorithm. The algorithm must be easily understood by a large number of readers. In addition, developers must be able to guide the engineering development of algorithms intuitively, so that system builders can realize the inevitable algorithm expansion when faced with problems.
In the design of raft foundation, we have to face the choice of various feasible methods. In these cases, we evaluate alternatives based on intelligibility: evaluate the difficulty of explaining each alternative (for example, how complex its state space is, and is there any complicated hidden content/information? ) and try to evaluate how easy it is for readers to fully understand Raft algorithm and its implied information.
We realize that this analysis is very subjective. Nevertheless, we still use two feasible techniques to evaluate it. The first is problem deconstruction (breaking down a big problem into several small problems). For example, we divide the Raft algorithm into three parts: leader election, log replication and security. The second method is to reduce the state space of the state machine by reducing the States to be considered, so that the consistency algorithm is as orderly as possible and the uncertainty is eliminated. Specifically, Raft algorithm does not allow the log to be empty (empty means that there is no corresponding content in an entry, but the following entries have content, which is a shortcoming of Paxos algorithm), and Raft algorithm limits the possibility of log inconsistency (through the restriction of leader election and the restriction that only the leader is the main data source, it ensures that followers will not transmit log content to each other, so long as they are consistent with the leader, the inconsistency of logs on each server is reduced). Although we try our best to avoid the uncertainty in the algorithm, we sometimes enhance the intelligibility of the algorithm through this uncertainty (this is the performance with intelligibility as the primary goal). For example, the random method introduces uncertainty, but it can reduce the state space (through the random method, we can avoid dealing with various problems that may occur in the system, such as when the leader is elected, different candidates initiate the next round of elections by overtime, thus avoiding.
Raft algorithm is an algorithm for managing log copies in the form described in section 2. 1. Figure 3. 1 summarizes the algorithm for reference, and Figure 3.2 lists the key attributes of the algorithm. The rest of this chapter will be discussed in several sections.
Raft first chooses a server as the leader, and then makes the leader fully responsible for managing the log copies in the system. The Leader accepts log entries from the client, copies them to other servers, and tells the server when it is safe to apply log entries to its state machine. Simplify the management of log copies through Leader management. For example, the leader can decide to put new entries in the log without consulting other servers, and the data flows from the leader to other servers (followers or candidates) in a simple way. The leader may fail or disconnect from other servers, in which case, a new leader will be selected.
Raft algorithm decomposes the consistency problem into three relatively independent sub-problems through the leader-based method:
After introducing the Raft consensus algorithm, this chapter discusses the availability, the role of timing in the system (section 3.9) and the optional extension of the Leader transformation between servers (section 3. 10).
Figure 3- 1 shows the key elements, operations and principles of the Raft algorithm. Because the whole picture is too big, we will divide this big picture into four small pictures to illustrate.
Figure 3- 1 state (state)
1. Persistent state on all servers (persistent state on all servers)
(Update local persistent store before replying to RPC request)
& ltcenter style = " box-sizing:border-box; Margin-top: 0px margin-bottom: 0px color: rgb( 192, 192,192); Text decoration: underline; " & gt figure 3.2-2 RPC < /center >
(Candidates refer to this request as collecting votes.)
3. AppendeEntries RPC is an RPC request, which is used to synchronize log entries between leaders and followers. Through this RPC request, the leader requests followers to add log entries to realize the synchronization of logs.
(Called by the leader for the backup of log entries, and this request will also shoulder the function of heartbeat check)
To illustrate the comparison scheme between prevLogIndex and prevLogTerm, please refer to the following figure.
In the figure, the term of the leader and the follower is 3, and the logentries in the leader are newer than those in the follower, so the leader needs to send an AppendEntries request to let the follower synchronize these logentries. That is, previewindex and previewterm in RPC request are i- 1 and 3 respectively. When the follower receives this RPC request, the index of its current log entry is I- 1, and the term of office is 3, which matches the previewindex and previewterm in the RPC message, so the follower can use the log entry in entries[] to fill in its I to N. If the index of the latest log entry of the follower (here temporarily referred to as FollowerIndex) is not equal to i- 1, The first is follower index < PrevLogIndex, which means that the follower's log is still missing, so the follower will reply false, and the leader will update PrevLogIndex, prevLogTerm and entries[] again and initiate an AppendEntries request (which will be described in detail in the next chapter); In chapter 2, we follow the index & gtprevLogIndex, which means that Follower synchronizes the log entries after PrevLogIndex and prevLogTerm. However, is the content of the latter log entry the same as that in the RPC request? And is the content in the journal entries of prevLogIndex and prevLogTerm the same as that of the Leader? "The questions here will be updated after reading the content."
& lt& lt previous chapter: the purpose of Raft algorithm
Next chapter: Raft algorithm foundation.