Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDEV-29293 Refactor MDL BF abort logic #314

Open
wants to merge 20 commits into
base: 10.5
Choose a base branch
from

Commits on Apr 2, 2023

  1. MDEV-29293 MariaDB stuck on starting commit state

    The problem seems to be a deadlock between KILL command execution
    and BF abort issued by an applier, where:
    * KILL has locked victim's LOCK_thd_kill and LOCK_thd_data
    * applier has innodb side global lock mutex and victim trx mutex
    * KILL is calling innobase_kill_query, and is blocked by innodb global lock mutex
    * applier is in wsrep_innobase_kill_one_trx and is blocked by victim's LOCK_thd_kill
    
    The fix in this commit, removes the TOI replication of KILL command,
    and makes KILL execution less intrusive operation.
    Aborting of the victim happens now by using wsrep_abort_thd(),
    which is the same method as used for aborting victims of DDL execution.
    wsrep_thd_abort(), will start the victim aborting from inside innodb,
    holding the lock_sys mutex and victim trx mutex.
    Therefore the locking protocol is same as used in regular applier BF aborting procedure.
    wsrep_abort_thd will eventually call also THD::awake (as regular KILL would),
    and now awake is passed the user chosen kill signal, in case of KILL command execution.
    Applier BF aborting, otoh, will use KILL_QUERY_HARD signal.
    
    Notable changes in this commit:
    * wsrep client connections's error state may remain sticky after client connection is closed.
    This error message will then pop up for the next client session issuing first SQL statement.
    This problem raised with test galera.galera_bf_kill
    The fix is to reset wsrep client error state, before a THD is reused for next connetion
    
    * Releasing THD locks, in wsrep_abort_transaction, when locking innodb mutexes,
    this guarantees same locking order as with applier BF aborting
    
    * Handling BF aborting of idle victim of KILL QUERY (and lower signals) with
    background rollbacker.
    Kill signals higher than KILL_CONNECTION, otoh, will now skip background rollbacker treatment.
    This is because KILL_CONNECTION will wake up the victim so early, that victim execution may
    interfere with the rollbacker execution.
    
    * wsrep-lib is now using new branch: KILL_command, which has changed server_service::background_rollback()
    to return true/false depending on if the background rollbacking was started or not.
    
    * Avoiding to overwrite victim THD's error code to deadlock error,
    if aborting was due to manual KILL, this preserves the native error code for KILL victims
    sjaakola committed Apr 2, 2023
    Configuration menu
    Copy the full SHA
    efb06ef View commit details
    Browse the repository at this point in the history

Commits on Apr 3, 2023

  1. Configuration menu
    Copy the full SHA
    09e3f40 View commit details
    Browse the repository at this point in the history

Commits on Apr 11, 2023

  1. Configuration menu
    Copy the full SHA
    0b23494 View commit details
    Browse the repository at this point in the history

Commits on Apr 14, 2023

  1. MDEV-29293 Refactor MDL BF abort logic

    In this commit, the BF abort logic is refactored to eliminate
    the need to call back to server from InnoDB wsrep_abort_transaction().
    This is to simplify the MDL BF abort codepath so that the direction
    is always only server -> InnoDB. The main change is in wsrep_abort_thd(),
    which first does the BF abort through wsrep-lib, and then continues to
    ha_abort_transaction() if the wsrep-lib BF abort was successful and no
    background rollback was started for victim.
    
    KILL command handling for wsrep threads was changed so that the
    thread to be killed is flagged first with wsrep_abort_by_kill to
    avoid calling storage engine ha_kill_query() from awake_no_mutex().
    KILL for wsrep thread is executed by awake_no_mutex() combined
    with call to ha_abort_transaction().
    
    The locking order for wsrep BF aborts is now:
    
      lock_sys.mutex
      trx.mutex
      LOCK_thd_kill
      LOCK_thd_data
    
    With this, BF aborts which originate from InnoDB lock manager
    don't have to release lock_sys.mutex and trx.mutex in between,
    which reduces chance for race conditions.
    
    In order to make the above locking order work for
    wsrep_abort_transaction(), the function records InnoDB transaction
    ID and releases victim LOCK_thd_kill and LOCK_thd_data. InnoDB
    lock_sys.mutex is then grabbed and the victim InnoDB trx object
    is looked up by InnoDB transaction ID.
    
    Locking/unlocking LOCK_thd_kill and LOCK_thd_data was reorganized
    so that it happens in the visible scope as much as possible.
    
    Remove LOCK_thd_kill from wsrep_thd_LOCK/UNLOCK to allow more
    fine grained locking for SR BF abort which may require locking
    of victim LOCK_thd_kill. Added explicit call for
    wsrep_thd_kill_LOCK/UNLOCK where appropriate.
    
    Wsrep-lib was updated to version which allows external
    locking for BF abort calls.
    
    Changes to MTR tests:
    - Disable galera_bf_abort_group_commit. This test is going to
      be removed (MDEV-30855).
    - Make galera_var_retry_autocommit result more readable by echoing
      cases and expectations into result. Only one expected result for
      reap to verify that server returns expected status for query.
    - Record galera_gcache_recover_manytrx as result file was incomplete.
      Trivial change.
    - Make galera_create_table_as_select more deterministic:
      Wait until CTAS execution has reached MDL wait for multi-master
      conflict case. Expected error from multi-master conflict is
      ER_QUERY_INTERRUPTED. This is because CTAS does not yet have open
      wsrep transaction when it is waiting for MDL, query gets interrupted
      instead of BF aborted. This should be addressed in separate task.
    - Test galera_bf_abort_registering to check that registering trx gets
      BF aborted through MDL.
    
    Co-authored-by: Jan Lindström <[email protected]>
    temeo and janlindstrom committed Apr 14, 2023
    Configuration menu
    Copy the full SHA
    50ac97f View commit details
    Browse the repository at this point in the history

Commits on Apr 15, 2023

  1. Configuration menu
    Copy the full SHA
    ffc7daf View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    28b2d12 View commit details
    Browse the repository at this point in the history
  3. Disallow KILL for wsrep transactions in committing state

    When wsrep transaction has reached committing state,
    it must become immune for KILL. Accompanied MTR test
    can be used to reproduce assertion failure if the
    transaction is not immune.
    
    Implemented additional protection in wsrep_kill_thd()
    against KILL in committing states.
    temeo committed Apr 15, 2023
    Configuration menu
    Copy the full SHA
    fb990d4 View commit details
    Browse the repository at this point in the history
  4. Add forgotten result file

    temeo committed Apr 15, 2023
    Configuration menu
    Copy the full SHA
    6b6bde3 View commit details
    Browse the repository at this point in the history
  5. Process KILL asynchronously for committing thds

    If wsrep thread is on commit codepath when KILL happens,
    the kill_signal is saved and restored in wsrep_after_commit()
    so that it will be processed after commit is over.
    temeo committed Apr 15, 2023
    Configuration menu
    Copy the full SHA
    31aa218 View commit details
    Browse the repository at this point in the history

Commits on Apr 16, 2023

  1. Release trx.mutex for wsrep_thd_bf_abort()

    This changes the locking order to
    
    lock_sys.mutex
    LOCK_thd_kill
    LOCK_thd_data
    trx.mutex
    
    Order LOCK_thd_data -> trx.mutex is required by group commit.
    
    In order to avoid trx going out of scope, grap reference
    to victim before releasing the mutex.
    temeo committed Apr 16, 2023
    Configuration menu
    Copy the full SHA
    f6d98e7 View commit details
    Browse the repository at this point in the history
  2. Decide postponing kill over commit based on wsrep trx state

    Postpone kill only if the thread is supposed to commit
    the transaction. For transactions which will roll back,
    keep the killed state.
    
    Added THD::wsrep_killed_state for debugging.
    temeo committed Apr 16, 2023
    Configuration menu
    Copy the full SHA
    095a528 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    6e60023 View commit details
    Browse the repository at this point in the history
  4. Break the cycle with BF abort and group commit

    There is a cycle
    
      trx reference
        -> LOCK_commit_order
        -> LOCK_thd_data
        -> trx reference
    
    which may prevent the victim transaction committing
    while BF aborter is trying to lock LOCK_thd_data.
    
    Break this cycle by using wsrep_thd_TRYLOCK() and
    checking if the victim is already committed in
    memory if the lock cannot be taken. Locking will
    be retried until the lock gets taken or the
    victim is found committed in memory.
    temeo committed Apr 16, 2023
    Configuration menu
    Copy the full SHA
    b1b4f11 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    2ad2ed1 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    f8acc3b View commit details
    Browse the repository at this point in the history

Commits on Apr 17, 2023

  1. Removed wsrep_thd_set_wsrep_aborter() from service_wsrep.h

    The manipulation of the variable can be done solely on
    server side. Moreover, it is now debug only variable and
    could be excluded from optimized builds.
    
    Also fix THD::reset_killed() to check if wsrep is
    enabled for THD.
    temeo committed Apr 17, 2023
    Configuration menu
    Copy the full SHA
    845ba0e View commit details
    Browse the repository at this point in the history
  2. Move restoring kill state to wsrep_after_statement()

    This is to ensure that killed state will be restored
    before statement execution is over.
    temeo committed Apr 17, 2023
    Configuration menu
    Copy the full SHA
    1f016b8 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    6947855 View commit details
    Browse the repository at this point in the history
  4. Remove check for binlog group commits from galera_kill_group_commit

    In rare case of false negative, checking the counter would cause
    test failure-
    temeo committed Apr 17, 2023
    Configuration menu
    Copy the full SHA
    580ba32 View commit details
    Browse the repository at this point in the history
  5. Remove redundant change

    temeo committed Apr 17, 2023
    Configuration menu
    Copy the full SHA
    7d1aa2a View commit details
    Browse the repository at this point in the history