-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: save committed log id #897
Conversation
234f56c
to
6aa6823
Compare
- **Wrote committed log id to storage** Save the committed log id to storage before Raft apply log entries. This can recover state machine to the state corresponding to the committed log id upon system startup. - **Re-applied log entries after startup** If the last applied log id is smaller than the committed log id saved in storage, re-apply log entries from the next index of last applied log id to the committed log id. Version 1 storage API `RaftStorage` and Version 2 storage API `RaftLogStorage` both add API `save_committed()` and `read_committed()` to support saving/reading committed log id. These two new API are optional and provides default dummy implementation, an application does not need any modifications if it does not need this feature.
6aa6823
to
942ec78
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 10 of 10 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion
openraft/src/storage/v2.rs
line 95 at r1 (raw file):
/// /// - If the `committed` log id is not saved, Openraft will just recover the state machine to /// the state of the last snapshot taken.
I'm sorry, but I don't get it.
Why do we need additional committed counter? There is the log and the state machine. The log cares about deltas, the state machine about the snapshot. We need to ensure that the log contains everything needed to recover from the last available snapshot of the state machine (i.e., we can't purge logs after the snapshot) up to the last common committed state of the cluster (i.e., the log index accepted by the quorum).
For that, we just need to ensure that:
- the log is not purged past a log index until a new snapshot corresponding to a higher log index is durably persisted, by whatever means
- at the restart time, ask the state machine about the last applied index (i.e., the last index which is part of the snapshot we are restarting from) and feed it all log entries up to the committed index as determined by the quorum; potentially followed by purging logs which are not relevant anymore
The last committed index is determined by the leader, first during the election time and then continuously during normal operation. If the restarted node is the leader, at the latest after the election the index is known. If the restarted node is the follower, the leader will tell it the commit index, so it can re-apply the state.
The call to save_committed()
just adds another alloc/dealloc pair and virtual call to the hot path.
So where does this explicit storage of committed index does fit? Do you want to optimize the restart by pre-applying log entries before the election/restart of replication (at least, so it seems)? If so, does it matter? The applying of log indices to the state machine is independent of the log replication, so it won't block Raft anyway. Optimizing hot path is more important than optimizing restart, IMHO.
Is there something I didn't see?
Thanks.
Code quote:
/// Saves the last committed log id to storage.
///
/// # Optional feature
///
/// If the state machine flushes state to disk before returning from `apply()`,
/// then the application does not need to implement this method.
///
/// Otherwise, i.e., the state machine just relies on periodical snapshot to persist state to
/// disk:
///
/// - If the `committed` log id is saved, the state machine will be recovered to the state
/// corresponding to this `committed` log id upon system startup, i.e., the state at the point
/// when the committed log id was applied.
///
/// - If the `committed` log id is not saved, Openraft will just recover the state machine to
/// the state of the last snapshot taken.
The overhead introduced by calling The reason of adding a persisted
Although these issues can all be addressed, it introduce more work to make things strictly correct. |
Ah, OK, thanks, that is what I was missing. Then, the overhead should be negligible.
True. But, OTOH, the metrics is still "correct" :-). After the restart, the state machine needs to be updated.
Also true. We do store such info in the state machine ourselves, but this is not part of a regular snapshot, rather part of the data directory, which is persisted immediately after a change occurs. So it's fairly easy to handle that.
Ahhh, that is unsafety in person :-). Reading from follower nodes requires some additional protocol to ensure correctness. For instance, we as a database can use MVCC and then it's fairly easy to check whether the last required commit (from the DB perspective, not from the Raft perspective) made it to the follower already and thus whether we can safely serve proper snapshot of the data for read or not.
That's also true. I don't have a problem with the change, it just seemed as an unnecessary change. So maybe you could copy&paste the above info to the docs of respective methods and/or to the general docs. Thanks. |
I was wrong about this: When the follower receives a
You are right. That's what I'm gonna do. |
Changelog
Feature: save committed log id
Wrote committed log id to storage
Save the committed log id to storage before Raft apply log entries. This can
recover state machine to the state corresponding to the committed log id upon
system startup.
Re-applied log entries after startup
If the last applied log id is smaller than the committed log id saved in
storage, re-apply log entries from the next index of last applied log id to the
committed log id.
Version 1 storage API
RaftStorage
and Version 2 storage APIRaftLogStorage
both add APIsave_committed()
andread_committed()
to support saving/reading committed log id.
These two new API are optional and provides default dummy
implementation, an application does not need any modifications if it does
not need this feature.
This change is