Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDEV-29293 Refactor MDL BF abort logic #314

Open
wants to merge 20 commits into
base: 10.5
Choose a base branch
from

Conversation

temeo
Copy link

@temeo temeo commented Apr 13, 2023

In this commit, the BF abort logic is refactored to eliminate
the need to call back to server from InnoDB wsrep_abort_transaction().
This is to simplify the MDL BF abort codepath so that the direction
is always only server -> InnoDB. The main change is in wsrep_abort_thd(),
which first does the BF abort through wsrep-lib, and then continues to
ha_abort_transaction() if the wsrep-lib BF abort was successful and no
background rollback was started for victim.

KILL command handling for wsrep threads was changed so that the
thread to be killed is flagged first with wsrep_abort_by_kill to
avoid calling storage engine ha_kill_query() from awake_no_mutex().
KILL for wsrep thread is executed by awake_no_mutex() combined
with call to ha_abort_transaction().

The locking order for wsrep BF aborts is now:

lock_sys.mutex
trx.mutex
LOCK_thd_kill
LOCK_thd_data

With this, BF aborts which originate from InnoDB lock manager
don't have to release lock_sys.mutex and trx.mutex in between,
which reduces chance for race conditions.

In order to make the above locking order work for
wsrep_abort_transaction(), the function records InnoDB transaction
ID and releases victim LOCK_thd_kill and LOCK_thd_data. InnoDB
lock_sys.mutex is then grabbed and the victim InnoDB trx object
is looked up by InnoDB transaction ID.

Locking/unlocking LOCK_thd_kill and LOCK_thd_data was reorganized
so that it happens in the visible scope as much as possible.

Remove LOCK_thd_kill from wsrep_thd_LOCK/UNLOCK to allow more
fine grained locking for SR BF abort which may require locking
of victim LOCK_thd_kill. Added explicit call for
wsrep_thd_kill_LOCK/UNLOCK where appropriate.

Changes to MTR tests:

  • Disable galera_bf_abort_group_commit. This test is going to
    be removed (MDEV-30855).
  • Make galera_var_retry_autocommit result more readable by echoing
    cases and expectations into result. Only one expected result for
    reap to verify that server returns expected status for query.
  • Record galera_gcache_recover_manytrx as result file was incomplete.
    Trivial change.
  • Make galera_create_table_as_select more deterministic:
    Wait until CTAS execution has reached MDL wait for multi-master
    conflict case. Expected error from multi-master conflict is
    ER_QUERY_INTERRUPTED. This is because CTAS does not yet have open
    wsrep transaction when it is waiting for MDL, query gets interrupted
    instead of BF aborted. This should be addressed in separate task.
  • Test galera_bf_abort_registering to check that registering trx gets
    BF aborted through MDL.

Co-authored-by: Jan Lindström [email protected]

The problem seems to be a deadlock between KILL command execution
and BF abort issued by an applier, where:
* KILL has locked victim's LOCK_thd_kill and LOCK_thd_data
* applier has innodb side global lock mutex and victim trx mutex
* KILL is calling innobase_kill_query, and is blocked by innodb global lock mutex
* applier is in wsrep_innobase_kill_one_trx and is blocked by victim's LOCK_thd_kill

The fix in this commit, removes the TOI replication of KILL command,
and makes KILL execution less intrusive operation.
Aborting of the victim happens now by using wsrep_abort_thd(),
which is the same method as used for aborting victims of DDL execution.
wsrep_thd_abort(), will start the victim aborting from inside innodb,
holding the lock_sys mutex and victim trx mutex.
Therefore the locking protocol is same as used in regular applier BF aborting procedure.
wsrep_abort_thd will eventually call also THD::awake (as regular KILL would),
and now awake is passed the user chosen kill signal, in case of KILL command execution.
Applier BF aborting, otoh, will use KILL_QUERY_HARD signal.

Notable changes in this commit:
* wsrep client connections's error state may remain sticky after client connection is closed.
This error message will then pop up for the next client session issuing first SQL statement.
This problem raised with test galera.galera_bf_kill
The fix is to reset wsrep client error state, before a THD is reused for next connetion

* Releasing THD locks, in wsrep_abort_transaction, when locking innodb mutexes,
this guarantees same locking order as with applier BF aborting

* Handling BF aborting of idle victim of KILL QUERY (and lower signals) with
background rollbacker.
Kill signals higher than KILL_CONNECTION, otoh, will now skip background rollbacker treatment.
This is because KILL_CONNECTION will wake up the victim so early, that victim execution may
interfere with the rollbacker execution.

* wsrep-lib is now using new branch: KILL_command, which has changed server_service::background_rollback()
to return true/false depending on if the background rollbacking was started or not.

* Avoiding to overwrite victim THD's error code to deadlock error,
if aborting was due to manual KILL, this preserves the native error code for KILL victims
@temeo temeo marked this pull request as draft April 13, 2023 17:23
@temeo
Copy link
Author

temeo commented Apr 14, 2023

test_pr

3 similar comments
@temeo
Copy link
Author

temeo commented Apr 14, 2023

test_pr

@temeo
Copy link
Author

temeo commented Apr 14, 2023

test_pr

@temeo
Copy link
Author

temeo commented Apr 14, 2023

test_pr

@@ -121,7 +125,7 @@ CREATE TABLE t1 AS SELECT * FROM t2;
UNLOCK TABLES;

--connection node_1
--error ER_TABLE_EXISTS_ERROR,ER_LOCK_DEADLOCK
--error ER_TABLE_EXISTS_ERROR,ER_QUERY_INTERRUPTED
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ER_LOCK_DEADLOCK -> ER_QUERY_INTERRUPTED change comes from changed BF abort path for operations which don't have open wsrep transaction. This should be fixed for CTAS specifically by opening a wsrep transaction for CTAS operation before MDL wait.

@temeo
Copy link
Author

temeo commented Apr 14, 2023

test_pr

@temeo temeo force-pushed the 10.5-MDEV-29293-refact-mdl-bf-abort branch 2 times, most recently from f31f824 to 69067e0 Compare April 14, 2023 11:29
@@ -29,14 +29,12 @@ extern "C" my_bool wsrep_on(const THD *thd)

extern "C" void wsrep_thd_LOCK(const THD *thd)
{
mysql_mutex_lock(&thd->LOCK_thd_kill);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting LOCK_thd_kill and LOCK_thd_data to separate calls to allow BF aborting of SR transaction from InnoDB lock manager. BF aborting SR transaction grabs LOCK_thd_kill in THD::reset_globals() which causes
LOCK_thd_kill to be locked twice if wsrep_thd_LOCK() locks it too.

@@ -146,6 +146,7 @@ void Wsrep_server_service::release_high_priority_service(wsrep::high_priority_se
void Wsrep_server_service::background_rollback(wsrep::client_state& client_state)
{
Wsrep_client_state& cs= static_cast<Wsrep_client_state&>(client_state);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: No need for special handling for KILL command here, KILL does not go through wsrep-lib in this PR.

@temeo temeo force-pushed the 10.5-MDEV-29293-refact-mdl-bf-abort branch from 69067e0 to d4d2163 Compare April 14, 2023 12:13
@temeo temeo changed the title 10.5 MDEV 29293 refact mdl bf abort MDEV-29293 Refactor MDL BF abort logic Apr 14, 2023
In this commit, the BF abort logic is refactored to eliminate
the need to call back to server from InnoDB wsrep_abort_transaction().
This is to simplify the MDL BF abort codepath so that the direction
is always only server -> InnoDB. The main change is in wsrep_abort_thd(),
which first does the BF abort through wsrep-lib, and then continues to
ha_abort_transaction() if the wsrep-lib BF abort was successful and no
background rollback was started for victim.

KILL command handling for wsrep threads was changed so that the
thread to be killed is flagged first with wsrep_abort_by_kill to
avoid calling storage engine ha_kill_query() from awake_no_mutex().
KILL for wsrep thread is executed by awake_no_mutex() combined
with call to ha_abort_transaction().

The locking order for wsrep BF aborts is now:

  lock_sys.mutex
  trx.mutex
  LOCK_thd_kill
  LOCK_thd_data

With this, BF aborts which originate from InnoDB lock manager
don't have to release lock_sys.mutex and trx.mutex in between,
which reduces chance for race conditions.

In order to make the above locking order work for
wsrep_abort_transaction(), the function records InnoDB transaction
ID and releases victim LOCK_thd_kill and LOCK_thd_data. InnoDB
lock_sys.mutex is then grabbed and the victim InnoDB trx object
is looked up by InnoDB transaction ID.

Locking/unlocking LOCK_thd_kill and LOCK_thd_data was reorganized
so that it happens in the visible scope as much as possible.

Remove LOCK_thd_kill from wsrep_thd_LOCK/UNLOCK to allow more
fine grained locking for SR BF abort which may require locking
of victim LOCK_thd_kill. Added explicit call for
wsrep_thd_kill_LOCK/UNLOCK where appropriate.

Wsrep-lib was updated to version which allows external
locking for BF abort calls.

Changes to MTR tests:
- Disable galera_bf_abort_group_commit. This test is going to
  be removed (MDEV-30855).
- Make galera_var_retry_autocommit result more readable by echoing
  cases and expectations into result. Only one expected result for
  reap to verify that server returns expected status for query.
- Record galera_gcache_recover_manytrx as result file was incomplete.
  Trivial change.
- Make galera_create_table_as_select more deterministic:
  Wait until CTAS execution has reached MDL wait for multi-master
  conflict case. Expected error from multi-master conflict is
  ER_QUERY_INTERRUPTED. This is because CTAS does not yet have open
  wsrep transaction when it is waiting for MDL, query gets interrupted
  instead of BF aborted. This should be addressed in separate task.
- Test galera_bf_abort_registering to check that registering trx gets
  BF aborted through MDL.

Co-authored-by: Jan Lindström <[email protected]>
@temeo temeo force-pushed the 10.5-MDEV-29293-refact-mdl-bf-abort branch from d4d2163 to 50ac97f Compare April 14, 2023 12:15
@temeo temeo marked this pull request as ready for review April 14, 2023 12:18
@temeo temeo self-assigned this Apr 14, 2023
@temeo temeo force-pushed the 10.5-MDEV-29293-refact-mdl-bf-abort branch from bf8f134 to 9628c07 Compare April 15, 2023 11:33
When wsrep transaction has reached committing state,
it must become immune for KILL. Accompanied MTR test
can be used to reproduce assertion failure if the
transaction is not immune.

Implemented additional protection in wsrep_kill_thd()
against KILL in committing states.
@temeo temeo force-pushed the 10.5-MDEV-29293-refact-mdl-bf-abort branch from 8248f73 to fb990d4 Compare April 15, 2023 11:41
If wsrep thread is on commit codepath when KILL happens,
the kill_signal is saved and restored in wsrep_after_commit()
so that it will be processed after commit is over.
@temeo temeo force-pushed the 10.5-MDEV-29293-refact-mdl-bf-abort branch 2 times, most recently from bc4bc85 to ad8d7b1 Compare April 15, 2023 15:16
This changes the locking order to

lock_sys.mutex
LOCK_thd_kill
LOCK_thd_data
trx.mutex

Order LOCK_thd_data -> trx.mutex is required by group commit.

In order to avoid trx going out of scope, grap reference
to victim before releasing the mutex.
@temeo temeo force-pushed the 10.5-MDEV-29293-refact-mdl-bf-abort branch from ad8d7b1 to f6d98e7 Compare April 16, 2023 05:21
Postpone kill only if the thread is supposed to commit
the transaction. For transactions which will roll back,
keep the killed state.

Added THD::wsrep_killed_state for debugging.
There is a cycle

  trx reference
    -> LOCK_commit_order
    -> LOCK_thd_data
    -> trx reference

which may prevent the victim transaction committing
while BF aborter is trying to lock LOCK_thd_data.

Break this cycle by using wsrep_thd_TRYLOCK() and
checking if the victim is already committed in
memory if the lock cannot be taken. Locking will
be retried until the lock gets taken or the
victim is found committed in memory.
@temeo temeo force-pushed the 10.5-MDEV-29293-refact-mdl-bf-abort branch from 6ecf3fe to b1b4f11 Compare April 16, 2023 07:42
The manipulation of the variable can be done solely on
server side. Moreover, it is now debug only variable and
could be excluded from optimized builds.

Also fix THD::reset_killed() to check if wsrep is
enabled for THD.
This is to ensure that killed state will be restored
before statement execution is over.
In rare case of false negative, checking the counter would cause
test failure-
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants