Raft Structure Advice

A Raft instance has to deal with the arrival of external events
(Start() calls, AppendEntries and RequestVote RPCs, and RPC replies),
and it has to execute periodic tasks (elections and heart-beats).
There are many ways to structure your Raft code to manage these
activities; this document outlines a few ideas.
Raft实例必须处理外部事件的到达(Start()调用,AppendEntries和RequestVote RPC和RPC答复),
并且必须执行定期任务(选举和心跳)。
有很多方法可以构建Raft代码来管理这些活动。
本文档概述了一些想法。

Each Raft instance has a bunch of state (the log, the current index,
&c) which must be updated in response to events arising in concurrent
goroutines. The Go documentation points out that the goroutines can
perform the updates directly using shared data structures and locks,
or by passing messages on channels. Experience suggests that for Raft
it is most straightforward to use shared data and locks.
每个Raft实例都有一堆状态(日志,当前索引和&c),
必须根据并发goroutine中发生的事件对其进行更新。
Go文档指出,goroutine可以直接使用共享数据结构和锁或通过在channel上传递消息来执行更新。
经验表明,对于Raft而言,使用共享数据和锁是最直接的方法。

A Raft instance has two time-driven activities: the leader must send
heart-beats, and others must start an election if too much time has
passed since hearing from the leader. It’s probably best to drive each
of these activities with a dedicated long-running goroutine, rather
than combining multiple activities into a single goroutine.
一个Raft实例有两个定时的活动:
领导者必须发送心跳信号,而其他节点发起选举则要满足leader断掉心跳一段时间后才能开始选举。
最好使用长时间运行的专用goroutine来驱动所有这些活动,而不是将多个活动组合成一个goroutine。

The management of the election timeout is a common source of
headaches. Perhaps the simplest plan is to maintain a variable in the
Raft struct containing the last time at which the peer heard from the
leader, and to have the election timeout goroutine periodically check
to see whether the time since then is greater than the timeout period.
It’s easiest to use time.Sleep() with a small constant argument to
drive the periodic checks. Don’t use time.Ticker and time.Timer;
they are tricky to use correctly.
选举超时的管理是令人头疼的常见原因。
简单的解决方法是在Raft结构中维护一个变量,
该变量包含Follower从Leader那里心跳的最后时间,
并让选举超时goroutine定期检查以查看此后的时间是否大于超时时间。
最简单的方法是使用time.Sleep()加一个随机常量(防止统一超时)来定期检查。
不要使用time.Ticker和time.Timer; 他们很难正确使用。

You’ll want to have a separate long-running goroutine that sends
committed log entries in order on the applyCh. It must be separate,
since sending on the applyCh can block; and it must be a single
goroutine, since otherwise it may be hard to ensure that you send log
entries in log order. The code that advances commitIndex will need to
kick the apply goroutine; it’s probably easiest to use a condition
variable (Go’s sync.Cond) for this.
你将需要一个单独的长期运行的goroutine,该例程在applyCh上按顺序发送已提交的日志条目。
它必须是分开的,因为在applyCh上发送可能会阻塞;
并且它必须是单个goroutine,因为否则可能很难确保您以日志顺序发送日志条目。
提升commitIndex的代码将需要启动apply goroutine;
为此,最容易使用条件变量(Go的sync.Cond)。

Each RPC should probably be sent (and its reply processed) in its own
goroutine, for two reasons: so that unreachable peers don’t delay the
collection of a majority of replies, and so that the heartbeat and
election timers can continue to tick at all times. It’s easiest to do
the RPC reply processing in the same goroutine, rather than sending
reply information over a channel.
每个RPC应该可能在其自己的goroutine中发送(并处理其回复),原因有两个:
使得无法访问的Follower不会阻塞大多数回复的收集,并且使心跳和选举计时器可以继续计时。
在同一个goroutine中最简单地进行RPC回复处理,而不是通过通道发送回复信息。(避免复杂通信)

Keep in mind that the network can delay RPCs and RPC replies, and when
you send concurrent RPCs, the network can re-order requests and
replies. Figure 2 is pretty good about pointing out places where RPC
handlers have to be careful about this (e.g. an RPC handler should
ignore RPCs with old terms). Figure 2 is not always explicit about RPC
reply processing. The leader has to be careful when processing
replies; it must check that the term hasn’t changed since sending the
RPC, and must account for the possibility that replies from concurrent
RPCs to the same follower have changed the leader’s state (e.g.
nextIndex).
请记住,网络可能会延迟RPC和RPC答复,并且当你发送并发RPC时,网络可以对请求和答复进行重新排序。
图2很好地指出了RPC处理程序必须注意的地方(例如,RPC处理程序应忽略带有任期号的RPC)。
关于RPC回复处理,图2说的不是很明确。Leader在处理回复时必须小心;
它必须检查自发送RPC以来term没有改变,
并且必须考虑到并发RPC对同一Follower的回复改变了领导者的状态(例如nextIndex)的可能性。
(图2是论文中的Figure 2)