Feature: save committed log id #897

drmingdrmer · 2023-07-16T15:09:44Z

Changelog

Feature: save committed log id

Wrote committed log id to storage
Save the committed log id to storage before Raft apply log entries. This can
recover state machine to the state corresponding to the committed log id upon
system startup.
Re-applied log entries after startup
If the last applied log id is smaller than the committed log id saved in
storage, re-apply log entries from the next index of last applied log id to the
committed log id.

Version 1 storage API RaftStorage and Version 2 storage API
RaftLogStorage both add API save_committed() and read_committed()
to support saving/reading committed log id.

These two new API are optional and provides default dummy
implementation, an application does not need any modifications if it does
not need this feature.

This change is

- **Wrote committed log id to storage** Save the committed log id to storage before Raft apply log entries. This can recover state machine to the state corresponding to the committed log id upon system startup. - **Re-applied log entries after startup** If the last applied log id is smaller than the committed log id saved in storage, re-apply log entries from the next index of last applied log id to the committed log id. Version 1 storage API `RaftStorage` and Version 2 storage API `RaftLogStorage` both add API `save_committed()` and `read_committed()` to support saving/reading committed log id. These two new API are optional and provides default dummy implementation, an application does not need any modifications if it does not need this feature.

schreter

Reviewed 10 of 10 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion

openraft/src/storage/v2.rs line 95 at r1 (raw file):

    ///
    /// - If the `committed` log id is not saved, Openraft will just recover the state machine to
    ///   the state of the last snapshot taken.

I'm sorry, but I don't get it.

Why do we need additional committed counter? There is the log and the state machine. The log cares about deltas, the state machine about the snapshot. We need to ensure that the log contains everything needed to recover from the last available snapshot of the state machine (i.e., we can't purge logs after the snapshot) up to the last common committed state of the cluster (i.e., the log index accepted by the quorum).

For that, we just need to ensure that:

the log is not purged past a log index until a new snapshot corresponding to a higher log index is durably persisted, by whatever means
at the restart time, ask the state machine about the last applied index (i.e., the last index which is part of the snapshot we are restarting from) and feed it all log entries up to the committed index as determined by the quorum; potentially followed by purging logs which are not relevant anymore

The last committed index is determined by the leader, first during the election time and then continuously during normal operation. If the restarted node is the leader, at the latest after the election the index is known. If the restarted node is the follower, the leader will tell it the commit index, so it can re-apply the state.

The call to save_committed() just adds another alloc/dealloc pair and virtual call to the hot path.

So where does this explicit storage of committed index does fit? Do you want to optimize the restart by pre-applying log entries before the election/restart of replication (at least, so it seems)? If so, does it matter? The applying of log indices to the state machine is independent of the log replication, so it won't block Raft anyway. Optimizing hot path is more important than optimizing restart, IMHO.

Is there something I didn't see?

Thanks.

Code quote:

    /// Saves the last committed log id to storage.
    ///
    /// # Optional feature
    ///
    /// If the state machine flushes state to disk before returning from `apply()`,
    /// then the application does not need to implement this method.
    ///
    /// Otherwise, i.e., the state machine just relies on periodical snapshot to persist state to
    /// disk:
    ///
    /// - If the `committed` log id is saved, the state machine will be recovered to the state
    ///   corresponding to this `committed` log id upon system startup, i.e., the state at the point
    ///   when the committed log id was applied.
    ///
    /// - If the `committed` log id is not saved, Openraft will just recover the state machine to
    ///   the state of the last snapshot taken.

drmingdrmer · 2023-07-18T14:24:16Z

@schreter

The overhead introduced by calling save_committed() should be minimal: in average, it will be called for every max_payload_entries log entries. Meanwhile I do not quite worry about the penalty, unless there is a measurable overhead.

The reason of adding a persisted committed is to just make things clear. Without persisting committed:

The RaftMetrics.last_applied will fall back upon startup, which may confuse a monitoring service.
The application may depend on the state in the state machine, e.g., by saving node information in the state machine. State falling back may cause a node to try to contact a removed node.
If the application provide stale reading from follower nodes, it's hard for the follower to know how many logs it must apply before serving stale read request.

Although these issues can all be addressed, it introduce more work to make things strictly correct.
A state machine that won't revert to a former state is easier to use:)

schreter · 2023-07-18T20:36:28Z

@drmingdrmer

The overhead introduced by calling save_committed() should be minimal: in average, it will be called for every max_payload_entries log entries.

Ah, OK, thanks, that is what I was missing. Then, the overhead should be negligible.

The RaftMetrics.last_applied will fall back upon startup, which may confuse a monitoring service.

True. But, OTOH, the metrics is still "correct" :-). After the restart, the state machine needs to be updated.

The application may depend on the state in the state machine, e.g., by saving node information in the state machine. State falling back may cause a node to try to contact a removed node.

Also true. We do store such info in the state machine ourselves, but this is not part of a regular snapshot, rather part of the data directory, which is persisted immediately after a change occurs. So it's fairly easy to handle that.

If the application provide stale reading from follower nodes, it's hard for the follower to know how many logs it must apply before serving stale read request.

Ahhh, that is unsafety in person :-). Reading from follower nodes requires some additional protocol to ensure correctness. For instance, we as a database can use MVCC and then it's fairly easy to check whether the last required commit (from the DB perspective, not from the Raft perspective) made it to the follower already and thus whether we can safely serve proper snapshot of the data for read or not.

A state machine that won't revert to a former state is easier to use:)

That's also true.

I don't have a problem with the change, it just seemed as an unnecessary change. So maybe you could copy&paste the above info to the docs of respective methods and/or to the general docs.

Thanks.

drmingdrmer · 2023-07-19T07:13:31Z

If the application provide stale reading from follower nodes, it's hard for the follower to know how many logs it must apply before serving stale read request.

I was wrong about this: When the follower receives a committed_log_id message from the leader, the committed_log_id is always greater than the last know committed log id before restart.

I don't have a problem with the change, it just seemed as an unnecessary change. So maybe you could copy&paste the above info to the docs of respective methods and/or to the general docs.

You are right. That's what I'm gonna do.

drmingdrmer requested review from lichuang and ariesdevil July 16, 2023 15:09

drmingdrmer force-pushed the 43-committed branch from 234f56c to 6aa6823 Compare July 16, 2023 15:30

drmingdrmer requested a review from schreter July 16, 2023 15:31

drmingdrmer force-pushed the 43-committed branch from 6aa6823 to 942ec78 Compare July 16, 2023 15:32

drmingdrmer merged commit fdc015f into databendlabs:main Jul 18, 2023
22 checks passed

drmingdrmer deleted the 43-committed branch July 18, 2023 02:16

schreter reviewed Jul 18, 2023

View reviewed changes

drmingdrmer mentioned this pull request Mar 9, 2024

Release 0.9 #1041

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: save committed log id #897

Feature: save committed log id #897

drmingdrmer commented Jul 16, 2023 •

edited

Loading

schreter left a comment

drmingdrmer commented Jul 18, 2023

schreter commented Jul 18, 2023

drmingdrmer commented Jul 19, 2023

Feature: save committed log id #897

Feature: save committed log id #897

Conversation

drmingdrmer commented Jul 16, 2023 • edited Loading

Changelog

Feature: save committed log id

schreter left a comment

Choose a reason for hiding this comment

drmingdrmer commented Jul 18, 2023

schreter commented Jul 18, 2023

drmingdrmer commented Jul 19, 2023

drmingdrmer commented Jul 16, 2023 •

edited

Loading