-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removing conflict check when the last conflict happened on a journal #1882
Conversation
sqlitecluster/SQLite.cpp
Outdated
const char* tableOffset = strstr(zMsg, "part of db table") + 17; | ||
|
||
// Check if the tableOffset exists since not all conflicts are on tables | ||
if (tableOffset) { | ||
// Based on the SQLite log line, we should always have ';' after the table name, | ||
// so let's use it to finish this loop. | ||
int i = 0; | ||
while (tableOffset[i] != ';') { | ||
_conflictTable += tableOffset[i]; | ||
i++; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using this log line as the base for this logic: https://github.com/Expensify/Bedrock/blob/676f804ed01ac09ffaedb2f3732069f0bbf50500/libstuff/sqlite3.c#L84720
sqlitecluster/SQLite.cpp
Outdated
@@ -284,6 +285,20 @@ void SQLite::_sqliteLogCallback(void* pArg, int iErrCode, const char* zMsg) { | |||
// 17 is the length of "conflict at page" and the following space. | |||
const char* offset = strstr(zMsg, "conflict at page") + 17; | |||
_conflictPage = atol(offset); | |||
|
|||
// 17 is the length of "part of db table" and the following space. | |||
const char* tableOffset = strstr(zMsg, "part of db table") + 17; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can start from offset
, we already know tablerOffset
will be after that.
Also, renaming offset
to pageOffset
might be clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
// Let's add this check in case the SQLite log changes and we don't notice it | ||
// since this would generate a runtime error. | ||
if (semicolonOffset) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment. You need the if
, yes, because otherwise semicolonOffset - tableOffset
will be negative.
The error you posted looks like you were setting msg
by hand, and that's the error you're seeing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below that that msg error we have a terminate called after throwing an instance of
std::length-error`, I just wanted to show that it will fail if we don't have the if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more suggested change.
|
||
// Let's add this check in case the SQLite log changes and we don't notice it | ||
// since this would generate a runtime error. | ||
if (semicolonOffset) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment. You need the if
, yes, because otherwise semicolonOffset - tableOffset
will be negative.
The error you posted looks like you were setting msg
by hand, and that's the error you're seeing.
@@ -778,6 +778,7 @@ void BedrockServer::runCommand(unique_ptr<BedrockCommand>&& _command, bool isBlo | |||
} | |||
|
|||
int64_t lastConflictPage = 0; | |||
string lastConflictTable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this defined so far up?
Details
The normal flow for a write command is to try to commit it. If that fails because of a conflict, we will retry the commits three times before sending them to the blocking queue.
When we have a conflict on a page that belongs to a table, that means that some other command is also changing that same page. We'll store the page that conflicted:
Bedrock/sqlitecluster/SQLite.cpp
Line 286 in 67b55ff
And then we'll wait until that page is not locked anymore before we try to execute the command again:
Bedrock/BedrockServer.cpp
Lines 910 to 913 in 67b55ff
When that happens for any tables in the system, it is not a problem. But when we have a conflict that happened on a journal table, then we're blocking the next commit from happening without reason. The journals are chosen when the commit is being executed, and the chances that we select the same journal on the next commit are low. that considered, we don't need to do a PageLockGuard if the previous conflict was on a journal table.
Fixed Issues
Fixes https://github.com/Expensify/Expensify/issues/432114
Tests
Ran the ConflictSpam test and confirmed that the changed log line is also logging the tables:
then I applied these changes to make the cluster use page conflicts
And confirmed that the logs were successfully doing what we wanted them to do:
Internal Testing Reminder: when changing bedrock, please compile auth against your new changes