-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
10.5 MDEV 29293 teemu #308
base: 10.5
Are you sure you want to change the base?
Commits on Apr 2, 2023
-
MDEV-29293 MariaDB stuck on starting commit state
The problem seems to be a deadlock between KILL command execution and BF abort issued by an applier, where: * KILL has locked victim's LOCK_thd_kill and LOCK_thd_data * applier has innodb side global lock mutex and victim trx mutex * KILL is calling innobase_kill_query, and is blocked by innodb global lock mutex * applier is in wsrep_innobase_kill_one_trx and is blocked by victim's LOCK_thd_kill The fix in this commit, removes the TOI replication of KILL command, and makes KILL execution less intrusive operation. Aborting of the victim happens now by using wsrep_abort_thd(), which is the same method as used for aborting victims of DDL execution. wsrep_thd_abort(), will start the victim aborting from inside innodb, holding the lock_sys mutex and victim trx mutex. Therefore the locking protocol is same as used in regular applier BF aborting procedure. wsrep_abort_thd will eventually call also THD::awake (as regular KILL would), and now awake is passed the user chosen kill signal, in case of KILL command execution. Applier BF aborting, otoh, will use KILL_QUERY_HARD signal. Notable changes in this commit: * wsrep client connections's error state may remain sticky after client connection is closed. This error message will then pop up for the next client session issuing first SQL statement. This problem raised with test galera.galera_bf_kill The fix is to reset wsrep client error state, before a THD is reused for next connetion * Releasing THD locks, in wsrep_abort_transaction, when locking innodb mutexes, this guarantees same locking order as with applier BF aborting * Handling BF aborting of idle victim of KILL QUERY (and lower signals) with background rollbacker. Kill signals higher than KILL_CONNECTION, otoh, will now skip background rollbacker treatment. This is because KILL_CONNECTION will wake up the victim so early, that victim execution may interfere with the rollbacker execution. * wsrep-lib is now using new branch: KILL_command, which has changed server_service::background_rollback() to return true/false depending on if the background rollbacking was started or not. * Avoiding to overwrite victim THD's error code to deadlock error, if aborting was due to manual KILL, this preserves the native error code for KILL victims
Configuration menu - View commit details
-
Copy full SHA for efb06ef - Browse repository at this point
Copy the full SHA efb06efView commit details
Commits on Apr 3, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 09e3f40 - Browse repository at this point
Copy the full SHA 09e3f40View commit details
Commits on Apr 11, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 0b23494 - Browse repository at this point
Copy the full SHA 0b23494View commit details
Commits on Apr 12, 2023
-
Reorganize locking/unlocking to happen in the same scope
In order to make things more manageable, changed the code so that the locking and unlocking happens in the same visible scope.
Configuration menu - View commit details
-
Copy full SHA for 4beec47 - Browse repository at this point
Copy the full SHA 4beec47View commit details -
Temp unlock trx mutex in wsrep_innobase_kill_one_trx()
This is to allow mutex locking order LOCK_thd_data -> trx mutex, which is needed to avoid a race in wsrep_abort_transaction(). The assumption is that lock_sys.mutex is enough to prevent the victim to change its state.
Configuration menu - View commit details
-
Copy full SHA for bec7cea - Browse repository at this point
Copy the full SHA bec7ceaView commit details -
Remove LOCK_thd_kill from wsrep_thd_LOCK/UNLOCK
Some codepaths require more fine grained locking, and unlocking LOCK_thd_kill from wsrep_thd_UNLOCK() might cause unpleasant surprise. Added explicit calls to wsrep_thd_kill_LOCK/UNLOCK where needed.
Configuration menu - View commit details
-
Copy full SHA for 0347afe - Browse repository at this point
Copy the full SHA 0347afeView commit details -
Restore assertions in wsrep_thd_bf_abort()
Locking order for BF codepaths is now LOCK_thd_kill (can be omitted if call to awake_no_mutex is not needed) lock_sys.mutex LOCK_thd_data trx mutex
Configuration menu - View commit details
-
Copy full SHA for 710521f - Browse repository at this point
Copy the full SHA 710521fView commit details
Commits on Apr 13, 2023
-
lock_sys.mutex LOCK_thd_kill LOCK_thd_data trx.mutex
Configuration menu - View commit details
-
Copy full SHA for 8d6ceaf - Browse repository at this point
Copy the full SHA 8d6ceafView commit details -
Deal with the sad fact that wsrep_abort_thd() and
ha_abort_transaction() return without thd mutexes held. Add SR worker THDs to server_threads list so they can be found via find_thread_by_id() for BF aborting.
Configuration menu - View commit details
-
Copy full SHA for 7700544 - Browse repository at this point
Copy the full SHA 7700544View commit details -
Configuration menu - View commit details
-
Copy full SHA for f850240 - Browse repository at this point
Copy the full SHA f850240View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9e729cf - Browse repository at this point
Copy the full SHA 9e729cfView commit details -
Configuration menu - View commit details
-
Copy full SHA for af2e9c4 - Browse repository at this point
Copy the full SHA af2e9c4View commit details -
Configuration menu - View commit details
-
Copy full SHA for aae411f - Browse repository at this point
Copy the full SHA aae411fView commit details