Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-117657: Quiet more TSAN warnings due to incorrect modeling of compare/exchange #117830

Merged
merged 1 commit into from
Apr 15, 2024

Conversation

mpage
Copy link
Contributor

@mpage mpage commented Apr 13, 2024

TSAN reports (example) data races between the non-atomic loads of tstate->state in detach_thread() and tstate_set_detached():

cpython/Python/pystate.c

Lines 2067 to 2080 in eca5362

static void
detach_thread(PyThreadState *tstate, int detached_state)
{
// XXX assert(tstate_is_alive(tstate) && tstate_is_bound(tstate));
assert(tstate->state == _Py_THREAD_ATTACHED);
assert(tstate == current_fast_get());
if (tstate->critical_section != 0) {
_PyCriticalSection_SuspendAll(tstate);
}
#ifdef Py_GIL_DISABLED
_Py_qsbr_detach(((_PyThreadStateImpl *)tstate)->qsbr);
#endif
tstate_deactivate(tstate);
tstate_set_detached(tstate, detached_state);

cpython/Python/pystate.c

Lines 2003 to 2008 in eca5362

static void
tstate_set_detached(PyThreadState *tstate, int detached_state)
{
assert(tstate->state == _Py_THREAD_ATTACHED);
#ifdef Py_GIL_DISABLED
_Py_atomic_store_int(&tstate->state, detached_state);

and the atomic compare/exchange of tstate->state in park_detached_threads():

cpython/Python/pystate.c

Lines 2159 to 2170 in eca5362

static bool
park_detached_threads(struct _stoptheworld_state *stw)
{
int num_parked = 0;
PyInterpreterState *i;
PyThreadState *t;
_Py_FOR_EACH_THREAD(stw, i, t) {
int state = _Py_atomic_load_int_relaxed(&t->state);
if (state == _Py_THREAD_DETACHED) {
// Atomically transition to "suspended" if in "detached" state.
if (_Py_atomic_compare_exchange_int(&t->state,
&state, _Py_THREAD_SUSPENDED)) {

I believe this is due to a bug with how TSAN models compare/exchange. Note that this appears to be a different issue than what we're seeing in #117828; the compare/exhange is succeeding in this case.

According to the standard,

If one evaluation modifies a memory location, and the other reads or modifies the same memory location, and if at least one of the evaluations is not an atomic operation, the behavior of the program is undefined (the program has a data race) unless there exists a happens-before relationship between these two evaluations.

However, according to the standard, A strongly happens-before B if any of the following are true:

  1. A is sequenced-before B.
  2. A synchronizes-with B.
  3. A strongly happens-before X, and X strongly happens-before B.

Using this we can establish a happens-before relationship between each non-atomic load and the compare/exchange:

  1. Each non-atomic load is sequenced-before the seq-cst store to tstate->state that happens in tstate_set_detached(). Therefore, each non-atomic load happens-before the seq-cst store.
  2. The seq-cst store to tstate->state in tstate_set_detached() synchronizes-with the compare/exchange to tstate->state in park_detached_threads(). Thus the seq-cst store happens-before the compare/exchange.
  3. (1) and (2) establish the happens-before relationship between each non-atomic load and the compare/exchange on tstate->state, therefore they should not be flagged as a data races.

TSAN complains about data races between the non-atomic load of `tstate->state` in
`detach_thread()` and `tstate_set_detached()` and the atomic compare/exchange of
`tstate->tstate` in `park_detached_threads()`. I believe this is due to a bug with
how TSAN models compare/exchange (treating it solely as a store, rather than a load/store).

According to the standard,

> If one evaluation modifies a memory location, and the other
> reads or modifies the same memory location, and if at least one
> of the evaluations is not an atomic operation, the behavior of
> the program is undefined (the program has a data race) unless
> there exists a happens-before relationship between these two
> evaluations.

However, according to the standard, A happens-before B if any of the following are
true:

> 1) A is sequenced-before B.
> 2) A synchronizes-with B.
> 3) A strongly happens-before X, and X strongly happens-before B.

Using this we can establish a happens-before relationship between each non-atomic
load and the compare/exchange:

1. Each non-atomic load is sequenced-before the seq-cst store to `tstate->state` that
   happens in `tstate_set_detached()`. Therefore, each non-atomic load happens-before
   the seq-cst store.
2. The seq-cst store to `tstate->state` in `tstate_set_detached()` synchronizes-with
   the compare/exchange to `tstate->state` in `park_detached_threads()`. Thus the seq-cst
   store happens-before the compare/exchange.
3. (1) and (2) establish the happens-before relationship between the non-atomic load and
   the compare/exchange on `tstate->state`, therefore it should not be flagged as a
   data race.
@colesbury colesbury merged commit 0d52383 into python:main Apr 15, 2024
38 checks passed
mpage added a commit to mpage/cpython that referenced this pull request Apr 16, 2024
diegorusso pushed a commit to diegorusso/cpython that referenced this pull request Apr 17, 2024
mpage added a commit to mpage/cpython that referenced this pull request Apr 22, 2024
…tate`

These should be the last instances of TSAN warning of data races when accessing
`tstate->state` non-atomically.

TSAN reports a race between accessing `tstate->state` in `_PyThreadState_Suspend()`
and the compare/exhange in `park_detached_threads`. This is a false positive due
to TSAN modeling failed compare/exchange operations as writes.

TSAN reports a race between accessing `tstate->state` in `_PySemaphore_Wait()`
and the compare/exchange in `park_detached_threads`. This is the same issue that
we saw in pythongh-117830.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants