Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: save committed log id #897

Merged
merged 1 commit into from
Jul 18, 2023

Conversation

drmingdrmer
Copy link
Member

@drmingdrmer drmingdrmer commented Jul 16, 2023

Changelog

Feature: save committed log id
  • Wrote committed log id to storage
    Save the committed log id to storage before Raft apply log entries. This can
    recover state machine to the state corresponding to the committed log id upon
    system startup.

  • Re-applied log entries after startup
    If the last applied log id is smaller than the committed log id saved in
    storage, re-apply log entries from the next index of last applied log id to the
    committed log id.

Version 1 storage API RaftStorage and Version 2 storage API
RaftLogStorage both add API save_committed() and read_committed()
to support saving/reading committed log id.

These two new API are optional and provides default dummy
implementation, an application does not need any modifications if it does
not need this feature.


This change is Reviewable

- **Wrote committed log id to storage**
  Save the committed log id to storage before Raft apply log entries. This can
  recover state machine to the state corresponding to the committed log id upon
  system startup.

- **Re-applied log entries after startup**
  If the last applied log id is smaller than the committed log id saved in
  storage, re-apply log entries from the next index of last applied log id to the
  committed log id.

Version 1 storage API `RaftStorage` and Version 2 storage API
`RaftLogStorage` both add API `save_committed()` and `read_committed()`
to support saving/reading committed log id.

These two new API are optional and provides default dummy
implementation, an application does not need any modifications if it
does not need this feature.
@drmingdrmer drmingdrmer merged commit fdc015f into databendlabs:main Jul 18, 2023
22 checks passed
@drmingdrmer drmingdrmer deleted the 43-committed branch July 18, 2023 02:16
Copy link
Collaborator

@schreter schreter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 10 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion


openraft/src/storage/v2.rs line 95 at r1 (raw file):

    ///
    /// - If the `committed` log id is not saved, Openraft will just recover the state machine to
    ///   the state of the last snapshot taken.

I'm sorry, but I don't get it.

Why do we need additional committed counter? There is the log and the state machine. The log cares about deltas, the state machine about the snapshot. We need to ensure that the log contains everything needed to recover from the last available snapshot of the state machine (i.e., we can't purge logs after the snapshot) up to the last common committed state of the cluster (i.e., the log index accepted by the quorum).

For that, we just need to ensure that:

  • the log is not purged past a log index until a new snapshot corresponding to a higher log index is durably persisted, by whatever means
  • at the restart time, ask the state machine about the last applied index (i.e., the last index which is part of the snapshot we are restarting from) and feed it all log entries up to the committed index as determined by the quorum; potentially followed by purging logs which are not relevant anymore

The last committed index is determined by the leader, first during the election time and then continuously during normal operation. If the restarted node is the leader, at the latest after the election the index is known. If the restarted node is the follower, the leader will tell it the commit index, so it can re-apply the state.

The call to save_committed() just adds another alloc/dealloc pair and virtual call to the hot path.

So where does this explicit storage of committed index does fit? Do you want to optimize the restart by pre-applying log entries before the election/restart of replication (at least, so it seems)? If so, does it matter? The applying of log indices to the state machine is independent of the log replication, so it won't block Raft anyway. Optimizing hot path is more important than optimizing restart, IMHO.

Is there something I didn't see?

Thanks.

Code quote:

    /// Saves the last committed log id to storage.
    ///
    /// # Optional feature
    ///
    /// If the state machine flushes state to disk before returning from `apply()`,
    /// then the application does not need to implement this method.
    ///
    /// Otherwise, i.e., the state machine just relies on periodical snapshot to persist state to
    /// disk:
    ///
    /// - If the `committed` log id is saved, the state machine will be recovered to the state
    ///   corresponding to this `committed` log id upon system startup, i.e., the state at the point
    ///   when the committed log id was applied.
    ///
    /// - If the `committed` log id is not saved, Openraft will just recover the state machine to
    ///   the state of the last snapshot taken.

@drmingdrmer
Copy link
Member Author

@schreter

The overhead introduced by calling save_committed() should be minimal: in average, it will be called for every max_payload_entries log entries. Meanwhile I do not quite worry about the penalty, unless there is a measurable overhead.

The reason of adding a persisted committed is to just make things clear. Without persisting committed:

  • The RaftMetrics.last_applied will fall back upon startup, which may confuse a monitoring service.
  • The application may depend on the state in the state machine, e.g., by saving node information in the state machine. State falling back may cause a node to try to contact a removed node.
  • If the application provide stale reading from follower nodes, it's hard for the follower to know how many logs it must apply before serving stale read request.

Although these issues can all be addressed, it introduce more work to make things strictly correct.
A state machine that won't revert to a former state is easier to use:)

@schreter
Copy link
Collaborator

@drmingdrmer

The overhead introduced by calling save_committed() should be minimal: in average, it will be called for every max_payload_entries log entries.

Ah, OK, thanks, that is what I was missing. Then, the overhead should be negligible.

The RaftMetrics.last_applied will fall back upon startup, which may confuse a monitoring service.

True. But, OTOH, the metrics is still "correct" :-). After the restart, the state machine needs to be updated.

The application may depend on the state in the state machine, e.g., by saving node information in the state machine. State falling back may cause a node to try to contact a removed node.

Also true. We do store such info in the state machine ourselves, but this is not part of a regular snapshot, rather part of the data directory, which is persisted immediately after a change occurs. So it's fairly easy to handle that.

If the application provide stale reading from follower nodes, it's hard for the follower to know how many logs it must apply before serving stale read request.

Ahhh, that is unsafety in person :-). Reading from follower nodes requires some additional protocol to ensure correctness. For instance, we as a database can use MVCC and then it's fairly easy to check whether the last required commit (from the DB perspective, not from the Raft perspective) made it to the follower already and thus whether we can safely serve proper snapshot of the data for read or not.

A state machine that won't revert to a former state is easier to use:)

That's also true.

I don't have a problem with the change, it just seemed as an unnecessary change. So maybe you could copy&paste the above info to the docs of respective methods and/or to the general docs.

Thanks.

@drmingdrmer
Copy link
Member Author

If the application provide stale reading from follower nodes, it's hard for the follower to know how many logs it must apply before serving stale read request.

I was wrong about this: When the follower receives a committed_log_id message from the leader, the committed_log_id is always greater than the last know committed log id before restart.

I don't have a problem with the change, it just seemed as an unnecessary change. So maybe you could copy&paste the above info to the docs of respective methods and/or to the general docs.

You are right. That's what I'm gonna do.

@drmingdrmer drmingdrmer mentioned this pull request Mar 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants