-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log the last time we spontaneously disconnected from the cluster when forked. #1874
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM, left a question and a NAB
sqlitecluster/SQLiteNode.cpp
Outdated
@@ -2785,3 +2790,13 @@ void SQLiteNode::kill() { | |||
peer->reset(); | |||
} | |||
} | |||
|
|||
string SQLiteNode::_getLostQuorumLogMessage() const { | |||
string lostQuormMessage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string lostQuormMessage; | |
string lostQuorumMessage; |
sqlitecluster/SQLiteNode.cpp
Outdated
@@ -1591,12 +1591,14 @@ void SQLiteNode::_onMESSAGE(SQLitePeer* peer, const SData& message) { | |||
uint64_t commitNum = SToUInt64(message["hashMismatchNumber"]); | |||
_db.getCommits(commitNum, commitNum, result); | |||
_forkedFrom.insert(peer->name); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't removing the blank line, just the spaces in the blank line
Co-authored-by: Chirag Chandrakant Salian <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Details
This adds extra logging to the
Hash mismatch
log lines to indicate if we've recently lost quorum due to a disconnection.It is more-or-less expected that losing quorum while leading will result in a forked DB node. There's no way for the node to anticipate that it is about to lose quorum, and so it will continue committing parallel transaction until it notices the disconnect, at which point it's too late. Another node will begin leading and this node will have commits that it was unable to send.
Fixed Issues
Fixes https://github.com/Expensify/Expensify/issues/384477 https://github.com/Expensify/Expensify/issues/422697
Tests
Artificially setting the timestamp logs:
Internal Testing Reminder: when changing bedrock, please compile auth against your new changes