Skip to content

Commit

Permalink
Doc: add log_stream.md
Browse files Browse the repository at this point in the history
Doc `log_stream.md` discusses the core aspects of streaming log replication.

- Part of databendlabs#1029
  • Loading branch information
drmingdrmer committed Apr 11, 2024
1 parent dcb062e commit 0ee6b2d
Show file tree
Hide file tree
Showing 4 changed files with 102 additions and 6 deletions.
5 changes: 3 additions & 2 deletions openraft/src/docs/docs.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,9 @@ To learn about the data structures used in Openraft and the commit protocol, see
- [`Effective membership`](`crate::docs::data::effective_membership`) explains when membership config takes effect;
- [`protocol`](crate::docs::protocol) :
- [`replication`](`crate::docs::protocol::replication`);
- [`leader_lease`](`crate::docs::protocol::replication::leader_lease`);
- [`log_replication`](`crate::docs::protocol::replication::log_replication`);
- [`leader_lease`](`crate::docs::protocol::replication::leader_lease`) outlines the leader validity criteria for Leaders and Followers;
- [`log_replication`](`crate::docs::protocol::replication::log_replication`) provides an overview of the general replication protocol;
- [`log_stream`](`crate::docs::protocol::replication::log_stream`) discusses the core aspects of streaming log replication;
- [`snapshot_replication`](`crate::docs::protocol::replication::snapshot_replication`);

Contributors who want to understand the internals of Openraft can find relevant information in
Expand Down
81 changes: 81 additions & 0 deletions openraft/src/docs/protocol/log_stream.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Ensuring Consecutive Raft Log Replication

To stream log entries to a remote node,
log entries are read in more than one local IO operations.
There are issues to address to provide consecutivity on the remote end.

In a stream log replication protocol, Raft logs are read through multiple local
IO operations.
To ensure the integrity of log replication to remote nodes, addressing potential
disruptions in log entry sequence is crucial.


## Problem Description

In a Raft-based system, there is no guarantee that sequential read-log
operations, such as `read log[i]` and `read log[i+1]`, will yield consecutive
log entries. The reason for this is that Raft logs can be truncated, which
disrupts the continuity of the log sequence. Consider the following scenario:

- Node-1, as a leader, reads `log[4]`.
- Subsequently, Node-1 receives a higher term vote and demotes itself.
- In the interim, a new leader emerges and truncates the logs on Node-1.
- Node-1 regains leadership, appends new logs, and attempts to read `log[5]`.

Now, the read operation for `log[5]` becomes invalid because the log at index 6,
term 5 (`6-5`) is no longer a direct successor to the log at index 4, term 4
(`4-4`). This violates Raft's log continuity requirement for replication.

To illustrate:

```text
i-j: Raft log at term i and index j
Node-1: | vote(term=4)
| 2-3 2-4
^
'--- Read log[4]
Node-1: | vote(term=5) // Higher term vote received
| 2-3 2-4
Node-1: | vote(term=5) // Logs are truncated, 4-4 is replicated
| 2-3 4-4
^
'--- Truncation and replication by another leader
Node-1: | vote(term=6) // Node-1 becomes leader again
| 2-3 4-4 6-5 // Append noop log
Node-1: | vote(term=6)
| 2-3 4-4 6-5
^
'--- Attempt to read log[5]
```


## Solution

To ensure consecutive log entries for safe replication, it is necessary to
verify that the term has not changed between log reads. The following updated
operations are proposed:

1. `read vote` (returns `v1`)
2. `read log[i]`
3. `read log[i+1]`
4. `read vote` (returns `v2`)

If `v1` is equal to `v2`, we can confidently say that `log[i]` and `log[i+1]`
are consecutive and the replication is safe. Therefore, `ReplicationCore` must
check the term (vote) as part of its replication process to ensure log
consecutivity.

## Conclusion

The current release(upto 0.9) mitigates this issue by immediately halting communication
with `ReplicationCore` to prevent any new replication commands from being
processed.

However, future iterations of `ReplicationCore` may operate more proactively.
To maintain the integrity of replication, `ReplicationCore`
must ensure the consecutivity in the above method.
4 changes: 4 additions & 0 deletions openraft/src/docs/protocol/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@ pub mod replication {
#![doc = include_str!("log_replication.md")]
}

pub mod log_stream {
#![doc = include_str!("log_stream.md")]
}

pub mod snapshot_replication {
#![doc = include_str!("snapshot_replication.md")]
}
Expand Down
18 changes: 14 additions & 4 deletions openraft/src/storage/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,10 @@ pub struct LogState<C: RaftTypeConfig> {

/// A trait defining the interface for a Raft log subsystem.
///
/// This interface is accessed read-only from replica streams.
/// This interface is accessed read-only by replication sub task: `ReplicationCore`.
///
/// A log reader must also be able to read the last saved vote by [`RaftLogStorage::save_vote`],
/// See: [log-stream](`crate::docs::protocol::replication::log_stream`).
///
/// Typically, the log reader implementation as such will be hidden behind an `Arc<T>` and
/// this interface implemented on the `Arc<T>`. It can be co-implemented with [`RaftLogStorage`]
Expand All @@ -133,16 +136,23 @@ where C: RaftTypeConfig
{
/// Get a series of log entries from storage.
///
/// The start value is inclusive in the search and the stop value is non-inclusive: `[start,
/// stop)`.
/// ### Correctness requirements
///
/// - The absence of an entry is tolerated only at the beginning or end of the range. Missing
/// entries within the range (i.e., holes) are not permitted and should result in a
/// `StorageError`.
///
/// Entry that is not found is allowed.
/// - The read operation must be transactional. That is, it should not reflect any state changes
/// that occur after the read operation has commenced.
async fn try_get_log_entries<RB: RangeBounds<u64> + Clone + Debug + OptionalSend>(
&mut self,
range: RB,
) -> Result<Vec<C::Entry>, StorageError<C::NodeId>>;

/// Return the last saved vote by [`RaftLogStorage::save_vote`].
///
/// A log reader must also be able to read the last saved vote by [`RaftLogStorage::save_vote`],
/// See: [log-stream](`crate::docs::protocol::replication::log_stream`)
async fn read_vote(&mut self) -> Result<Option<Vote<C::NodeId>>, StorageError<C::NodeId>>;
}

Expand Down

0 comments on commit 0ee6b2d

Please sign in to comment.