10.5 MDEV 29293 teemu #308

temeo · 2023-04-12T05:42:02Z

MDEV-29293 MariaDB stuck on starting commit state
new checkout of wsrep-lib/KILL_command branch
Simplifying the patch in kill_one_thread()
Reorganize locking/unlocking to happen in the same scope

The problem seems to be a deadlock between KILL command execution and BF abort issued by an applier, where: * KILL has locked victim's LOCK_thd_kill and LOCK_thd_data * applier has innodb side global lock mutex and victim trx mutex * KILL is calling innobase_kill_query, and is blocked by innodb global lock mutex * applier is in wsrep_innobase_kill_one_trx and is blocked by victim's LOCK_thd_kill The fix in this commit, removes the TOI replication of KILL command, and makes KILL execution less intrusive operation. Aborting of the victim happens now by using wsrep_abort_thd(), which is the same method as used for aborting victims of DDL execution. wsrep_thd_abort(), will start the victim aborting from inside innodb, holding the lock_sys mutex and victim trx mutex. Therefore the locking protocol is same as used in regular applier BF aborting procedure. wsrep_abort_thd will eventually call also THD::awake (as regular KILL would), and now awake is passed the user chosen kill signal, in case of KILL command execution. Applier BF aborting, otoh, will use KILL_QUERY_HARD signal. Notable changes in this commit: * wsrep client connections's error state may remain sticky after client connection is closed. This error message will then pop up for the next client session issuing first SQL statement. This problem raised with test galera.galera_bf_kill The fix is to reset wsrep client error state, before a THD is reused for next connetion * Releasing THD locks, in wsrep_abort_transaction, when locking innodb mutexes, this guarantees same locking order as with applier BF aborting * Handling BF aborting of idle victim of KILL QUERY (and lower signals) with background rollbacker. Kill signals higher than KILL_CONNECTION, otoh, will now skip background rollbacker treatment. This is because KILL_CONNECTION will wake up the victim so early, that victim execution may interfere with the rollbacker execution. * wsrep-lib is now using new branch: KILL_command, which has changed server_service::background_rollback() to return true/false depending on if the background rollbacking was started or not. * Avoiding to overwrite victim THD's error code to deadlock error, if aborting was due to manual KILL, this preserves the native error code for KILL victims

In order to make things more manageable, changed the code so that the locking and unlocking happens in the same visible scope.

This is to allow mutex locking order LOCK_thd_data -> trx mutex, which is needed to avoid a race in wsrep_abort_transaction(). The assumption is that lock_sys.mutex is enough to prevent the victim to change its state.

janlindstrom · 2023-04-12T07:31:26Z

storage/innobase/handler/ha_innodb.cc


+  /* Temporarily release victim_trx mutex to allow locking order
+     LOCK_thd_data -> trx_mutex_exit() needed in wsrep_abort_transaction() */
+  trx_mutex_exit(victim_trx);
  DEBUG_SYNC(bf_thd, "wsrep_before_BF_victim_lock");
  wsrep_thd_LOCK(thd);


I do not think it is safe to use same victim_trx pointer after you have released trx mutex, you might need to get it from thd again after you have done wsrep_thd_LOCK and if it is not there bail out.

janlindstrom · 2023-04-12T07:33:05Z

storage/innobase/handler/ha_innodb.cc

-
-  /* Unlock temporarily to grab mutexes in the right order. */
+  /* Unlock temporarily to grab mutexes in the right order.
+     Note that victim thd is protected with THD::LOCK_thd_kill here. */
  wsrep_thd_UNLOCK(victim_thd);
  lock_mutex_enter();


This is also problematic as we have released victim thread THD mutexes it can be disconnected or destroyed.

This is now fixed below by find_thread_by_id() and checking that the victim transaction ID is still the same.

Some codepaths require more fine grained locking, and unlocking LOCK_thd_kill from wsrep_thd_UNLOCK() might cause unpleasant surprise. Added explicit calls to wsrep_thd_kill_LOCK/UNLOCK where needed.

Locking order for BF codepaths is now LOCK_thd_kill (can be omitted if call to awake_no_mutex is not needed) lock_sys.mutex LOCK_thd_data trx mutex

lock_sys.mutex LOCK_thd_kill LOCK_thd_data trx.mutex

ha_abort_transaction() return without thd mutexes held. Add SR worker THDs to server_threads list so they can be found via find_thread_by_id() for BF aborting.

sjaakola and others added 5 commits April 2, 2023 17:56

new checkout of wsrep-lib/KILL_command branch

09e3f40

Simplifying the patch in kill_one_thread()

0b23494

Reorganize locking/unlocking to happen in the same scope

4beec47

In order to make things more manageable, changed the code so that the locking and unlocking happens in the same visible scope.

Temp unlock trx mutex in wsrep_innobase_kill_one_trx()

bec7cea

This is to allow mutex locking order LOCK_thd_data -> trx mutex, which is needed to avoid a race in wsrep_abort_transaction(). The assumption is that lock_sys.mutex is enough to prevent the victim to change its state.

janlindstrom reviewed Apr 12, 2023

View reviewed changes

temeo added 3 commits April 12, 2023 11:49

Remove LOCK_thd_kill from wsrep_thd_LOCK/UNLOCK

0347afe

Some codepaths require more fine grained locking, and unlocking LOCK_thd_kill from wsrep_thd_UNLOCK() might cause unpleasant surprise. Added explicit calls to wsrep_thd_kill_LOCK/UNLOCK where needed.

Restore assertions in wsrep_thd_bf_abort()

710521f

Locking order for BF codepaths is now LOCK_thd_kill (can be omitted if call to awake_no_mutex is not needed) lock_sys.mutex LOCK_thd_data trx mutex

Fixed locking order to be

8d6ceaf

lock_sys.mutex LOCK_thd_kill LOCK_thd_data trx.mutex

temeo force-pushed the 10.5-MDEV-29293-teemu branch from aa08a43 to 8d6ceaf Compare April 13, 2023 04:51

temeo added 5 commits April 13, 2023 09:03

Deal with the sad fact that wsrep_abort_thd() and

7700544

ha_abort_transaction() return without thd mutexes held. Add SR worker THDs to server_threads list so they can be found via find_thread_by_id() for BF aborting.

Fix kill_one_thread()

f850240

Documented wsrep_abort_thd()

9e729cf

Disable galera_bf_abort_group_commit

af2e9c4

Fix abort replicated

aae411f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

10.5 MDEV 29293 teemu #308

10.5 MDEV 29293 teemu #308

temeo commented Apr 12, 2023

janlindstrom Apr 12, 2023

janlindstrom Apr 12, 2023

temeo Apr 13, 2023 •

edited

Loading

10.5 MDEV 29293 teemu #308

Are you sure you want to change the base?

10.5 MDEV 29293 teemu #308

Conversation

temeo commented Apr 12, 2023

janlindstrom Apr 12, 2023

Choose a reason for hiding this comment

janlindstrom Apr 12, 2023

Choose a reason for hiding this comment

temeo Apr 13, 2023 • edited Loading

Choose a reason for hiding this comment

temeo Apr 13, 2023 •

edited

Loading