Simple racing class attribute read-write crashes on free-threaded builds #118362

Fidget-Spinner · 2024-04-28T12:30:59Z

Crash report

What happened?

The following segfaults:

from multiprocessing.dummy import Pool


NTHREADS = 6

class A:
    attr = 1

BOTTOM = 0
TOP = 1000

ITERS = 100
def read(id0):
    for _ in range(ITERS):
        for _ in range(BOTTOM, TOP):
            A.attr
            # print(A.attr)

def write(id0):
    for _ in range(ITERS):
        for _ in range(BOTTOM, TOP):
            # Make _PyType_Lookup cache hot first
            A.attr
            A.attr
            x = A.attr
            x += 1
            A.attr = x
            # print(f"WRITTEN {x}\n")


def main():    
    with Pool(NTHREADS) as pool:    
        pool.apply_async(read, (1,))
        pool.apply_async(write, (1,))
        pool.close()
        pool.join()

main()
print("done")

The first issue is that we have assert(_PyType_HasFeature(type, Py_TPFLAGS_VALID_VERSION_TAG)); in typeobject.c, which may not be true as another thread can modify the attribute in the meantime. We should remove that assertion.

The next issue, which I have not yet been able to solve, is that the attribute evaporates midway in _Py_type_getattro_impl. This is likely caused by a decref of another thread. So far I've tried changing _PyType_Lookup and find_name_in_mro to return strong references, but they don't seem to solve it for some reason.

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

Output from running 'python -VV' on the command line:

No response

Linked PRs

The text was updated successfully, but these errors were encountered:

Fidget-Spinner · 2024-04-28T12:31:23Z

I'm not assigning myself because I'm still unable to figure this out at the moment. So anyone's free to take it up.

colesbury · 2024-04-30T14:33:23Z

The modification to the type's dictionary happens before type_modified_unlocked, which invalidates the cache. So the old attribute may be destroyed before the cache entry is invalidated.

There's a somewaht similar issue with the GIL because the destructors can call arbitrary code before the cache is invalidated, but it's much harder to trigger a crash because the old value finalizer is called before the memory is freed.

cc @DinoV

cpython/Objects/typeobject.c

Lines 5364 to 5379 in 8b56d82

    
               BEGIN_TYPE_LOCK() 
        
               res = _PyObject_GenericSetAttrWithDict((PyObject *)type, name, value, NULL); 
        
               if (res == 0) { 
        
                   /* Clear the VALID_VERSION flag of 'type' and all its 
        
                      subclasses.  This could possibly be unified with the 
        
                      update_subclasses() recursion in update_slot(), but carefully: 
        
                      they each have their own conditions on which to stop 
        
                      recursing into subclasses. */ 
        
                   type_modified_unlocked(type); 
        
                   if (is_dunder_name(name)) { 
        
                       res = update_slot(type, name); 
        
                   } 
        
                   assert(_PyType_CheckConsistency(type)); 
        
               } 
        
               END_TYPE_LOCK()

colesbury · 2024-04-30T14:42:35Z

A starting point would be to call type_modified_unlocked(type) to invalidate the cache before _PyObject_GenericSetAttrWithDict. However, _PyObject_GenericSetAttrWithDict is complex and may suspend the type lock critical section, which might allow another thread to re-cache the stale value before the attribute is updated. I think we might want to also disable cache updates for the duration of the update, e.g.:

For example, something like the following pseudo-code:

BEGIN_TYPE_LOCK();
PyInterpreterState_Get()->types.cache_enabled = false;
type_modified_unlocked(type);
res = _PyObject_GenericSetAttrWithDict((PyObject *)type, name, value, NULL);
if (res == 0) {
    ...
}
PyInterpreterState_Get()->types.cache_enabled = true;
END_TYPE_LOCK();

The caching logic in _PyType_Lookup would only modify the cache if it's not disabled (i.e., not in the middle of an attribute update).

DinoV · 2024-04-30T17:10:02Z

type_modified_unlocked(type);
res = _PyObject_GenericSetAttrWithDict((PyObject *)type, name, value, NULL);

I don't think we can mark the type as being modified before doing the set, as it delivers an event that may care about seeing the new value. There's also other potential issues around re-entrancy in the set, e.g. if there's a meta-type with a descriptor that then does a lookup against the type it'll have a valid version tag re-assigned before the mutation happens.

We could clear this cache entry first if it's the type and still use the flag to not update it. Another possibility would be that we lookup the current attribute (via the cache or MRO) and incref it and then decref it after the set. I'm kind of leaning towards the later just because the type cache is so hot and mutation of types is so rare.

DinoV · 2024-04-30T17:29:45Z

I think there's an additional issue here though which is that _PyType_Lookup is just not thread safe because it's returning a borrowed reference.

colesbury · 2024-04-30T17:37:44Z

it delivers an event that may care about seeing the new value...

Ok, we'd need to split up type_modified_unlocked and only do the clearing of the version tag first

if there's a meta-type with a descriptor that then does a lookup against the type it'll have a valid version tag re-assigned before the mutation happens...

I'm not sure I understand if there's some issue specific to meta-types here. I'm suggesting explicitly disabling all type cache updates for essentially the duration we hold the type lock. BEGIN/END_TYPE_LOCK almost does this already (it's a mutex) but not quite because the critical section can be suspended in lots of places in the middle allowing a concurrent type cache update to proceed.

Another possibility would be that we lookup the current attribute (via the cache or MRO) and incref it and then decref it after the set

This doesn't sound robust to me. One of the big problems is that the type lock is a critical section that doesn't really exclude concurrent updates because there are many opportunities for it to be suspended.

I think there's an additional issue here though which is that _PyType_Lookup is just not thread safe because it's returning a borrowed reference.

Yeah that's an additional issue.

DinoV · 2024-04-30T18:32:49Z

it delivers an event that may care about seeing the new value...

Ok, we'd need to split up type_modified_unlocked and only do the clearing of the version tag first

if there's a meta-type with a descriptor that then does a lookup against the type it'll have a valid version tag re-assigned before the mutation happens...

I'm not sure I understand if there's some issue specific to meta-types here. I'm suggesting explicitly disabling all type cache updates for essentially the duration we hold the type lock. BEGIN/END_TYPE_LOCK almost does this already (it's a mutex) but not quite because the critical section can be suspended in lots of places in the middle allowing a concurrent type cache update to proceed.

My comment here was specifically about the pseudo-code moving the type_modified_unlocked(type); call above the set attr. I think the fact that this delivers the type modified event to external listeners alone prevents us from doing it. But the reason I called out meta-types in particular as they're more likely to cause an additional _PyType_Lookup to happen on the type causing a new version to be assigned to it before the set happens. I'm not sure there's a descriptor in the base type which would cause that but maybe there is. But either way we could:

Invalidate the version
Run some other Python code which assigns a version tag (whether through re-entrancy or a race, this can happen outside of the type cache)
Finally do the store into the types dictionary

Not updating the type cache during this process would mean we're not corrupting the type cache with a stale value but a type-watcher would still become out of sync. So at the very least I think the type_modified_unlocked call needs to happen after the set.

Another possibility would be that we lookup the current attribute (via the cache or MRO) and incref it and then decref it after the set

This doesn't sound robust to me. One of the big problems is that the type lock is a critical section that doesn't really exclude concurrent updates because there are many opportunities for it to be suspended.

But if the object is kept alive by type_setattro then those suspensions don't matter? The other threads have no other synchronization against the type so not seeing the write until after the type_modified_unlocked is delivered seems fine.

colesbury · 2024-04-30T18:46:30Z

Ok, I think I understand now. I'd revise my suggestion so that type_setattro does:

Acquire the type lock (critical section)
Invalidate the types version cache recursively (i.e., the first part of type_modified_unlocked)
Disable type cache updates
Call _PyObject_GenericSetAttrWithDict
Re-invalidate the version cache and trigger watchers (i.e., type_modified_unlocked)
Enable type cache updates
Release the type lock

The other threads have no other synchronization against the type so not seeing the write until after the type_modified_unlocked is delivered seems fine...

But they'll get stale cache entries -- that seems bad to me. To a certain extent, this already happens in CPython with the GIL, but it's going to be more common and weirder with the GIL disabled.

DinoV · 2024-04-30T18:55:46Z

But they'll get stale cache entries -- that seems bad to me. To a certain extent, this already happens in CPython with the GIL, but it's going to be more common and weirder with the GIL disabled.

It seems like logically though there's no other synchronization happening and so the write isn't actually done until the version is invalidated. I guess it's possible that the re-entrancy could cause a 2nd write to happen that could be observed while still seeing the old value in the cache though breaking sequential consistency (although I guess to your point that's probably possible w/ the GIL already too).

colesbury · 2024-04-30T19:44:58Z

Here's a single-threaded example of seeing a stale cached value (even with the GIL):
https://gist.github.com/colesbury/49ab040d89faab3b0a8bdb9552bfe266

I basically have two concerns:

This is longstanding buggy CPython behavior, but if it becomes more frequently visible in the free-threaded build then that's a real problem.
I'm not sure how we can keep alive the correct object current attribute given that the cache can have stale values and the type lock can be suspended. (I might just be missing something here; it's hard to reason about without seeing actual code.)

DinoV · 2024-04-30T20:11:06Z

Here's a single-threaded example of seeing a stale cached value (even with the GIL): https://gist.github.com/colesbury/49ab040d89faab3b0a8bdb9552bfe266

I basically have two concerns:
1. This is longstanding buggy CPython behavior, but if it becomes more frequently visible in the free-threaded build then that's a real problem.

2. I'm not sure how we can keep alive the _correct_ object current attribute given that the cache can have stale values and the type lock can be suspended. (I might just be missing something here; it's hard to reason about without seeing actual code.)

This is what I was thinking for keeping it alive in type_setattro, we just see if it's currently in the cache and keep that alive, and if not we look it up in the MRO before hand and keep that alive:

    BEGIN_TYPE_LOCK()

#ifdef Py_GIL_DISABLED
    PyObject *old_value = NULL;
    // We could have the old value in the cache and we might destroy it before
    // we mark the type as being modified.  Therefore we keep the object alive
    // between the set and the type modified.
    unsigned int h = MCACHE_HASH_METHOD(type, name);
    struct type_cache *cache = get_type_cache();
    struct type_cache_entry *entry = &cache->hashtable[h];
    while (1) {
        int sequence = _PySeqLock_BeginRead(&entry->sequence);
        if (_Py_atomic_load_uint32_relaxed(&entry->version) == type->tp_version_tag &&
            _Py_atomic_load_ptr_relaxed(&entry->name) == name) {
            old_value = _Py_atomic_load_ptr_relaxed(&entry->value);

            // If the sequence is still valid then we're done
            if (_PySeqLock_EndRead(&entry->sequence, sequence)) {
                Py_XINCREF(old_value);
                break;
            }
        } else {
            int error;
            old_value = find_name_in_mro(type, name, &error);
            if (error) {
                return -1;
            }
            Py_XINCREF(old_value);
            break;
        }
    }
#endif

DinoV · 2024-05-01T02:19:32Z

I think #118454 will fix it.

I went with something closer to what @colesbury described with the multi-phase type version invalidation. But to split the invalidation into 2 phases could potentially leave the types in an inconsistent state, or we'd need to remember what types we actually invalidated.

I've instead made the update of the dict atomic with the invalidation of the type. This isn't very difficult to guarantee because other than the descriptor case the only thing that can cause re-entrancy below the lock is the finalizer on the object. The keys we're using against the dict are all exact unicode, and types only contain exact unicode, so we know the lookups are otherwise safe.

DinoV · 2024-05-01T02:37:06Z

Here's a single-threaded example of seeing a stale cached value (even with the GIL): https://gist.github.com/colesbury/49ab040d89faab3b0a8bdb9552bfe266
I basically have two concerns:
1. This is longstanding buggy CPython behavior, but if it becomes more frequently visible in the free-threaded build then that's a real problem.

2. I'm not sure how we can keep alive the _correct_ object current attribute given that the cache can have stale values and the type lock can be suspended. (I might just be missing something here; it's hard to reason about without seeing actual code.)
This is what I was thinking for keeping it alive in type_setattro, we just see if it's currently in the cache and keep that alive, and if not we look it up in the MRO before hand and keep that alive

And that wouldn't have worked if the value was cached for a subclass but not for the defining class.

…e face of concurrent mutators (#118454) Add _PyType_LookupRef and use incref before setting attribute on type Makes setting an attribute on a class and signaling type modified atomic Avoid adding re-entrancy exposing the type cache in an inconsistent state by decrefing after type is updated

* Skip tests when threads aren't available * Use ThreadPoolExecutor

colesbury · 2024-05-07T15:04:50Z

This is fixed now

… in the face of concurrent mutators (python#118454) Add _PyType_LookupRef and use incref before setting attribute on type Makes setting an attribute on a class and signaling type modified atomic Avoid adding re-entrancy exposing the type cache in an inconsistent state by decrefing after type is updated

…8666) * Skip tests when threads aren't available * Use ThreadPoolExecutor

Fidget-Spinner added type-crash A hard crash of the interpreter, possibly with a core dump topic-free-threading labels Apr 28, 2024

DinoV self-assigned this Apr 30, 2024

bedevere-app bot mentioned this issue May 1, 2024

gh-118362: Fix thread safety around lookups from the type cache in the face of concurrent mutators #118454

Merged

bedevere-app bot mentioned this issue May 6, 2024

gh-118362: Skip tests when threading isn't available #118666

Merged

DinoV added a commit that referenced this issue May 6, 2024

gh-118362: Skip tests when threading isn't available (#118666)

e272195

* Skip tests when threads aren't available * Use ThreadPoolExecutor

colesbury closed this as completed May 7, 2024

SonicField pushed a commit to SonicField/cpython that referenced this issue May 8, 2024

pythongh-118362: Skip tests when threading isn't available (python#11…

f42c506

…8666) * Skip tests when threads aren't available * Use ThreadPoolExecutor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple racing class attribute read-write crashes on free-threaded builds #118362

Simple racing class attribute read-write crashes on free-threaded builds #118362

Fidget-Spinner commented Apr 28, 2024 •

edited by bedevere-app bot

Loading

Fidget-Spinner commented Apr 28, 2024

colesbury commented Apr 30, 2024

colesbury commented Apr 30, 2024

DinoV commented Apr 30, 2024

DinoV commented Apr 30, 2024

colesbury commented Apr 30, 2024

DinoV commented Apr 30, 2024

colesbury commented Apr 30, 2024 •

edited

Loading

DinoV commented Apr 30, 2024

colesbury commented Apr 30, 2024 •

edited

Loading

DinoV commented Apr 30, 2024

DinoV commented May 1, 2024

DinoV commented May 1, 2024

colesbury commented May 7, 2024

Simple racing class attribute read-write crashes on free-threaded builds #118362

Simple racing class attribute read-write crashes on free-threaded builds #118362

Comments

Fidget-Spinner commented Apr 28, 2024 • edited by bedevere-app bot Loading

Crash report

What happened?

CPython versions tested on:

Operating systems tested on:

Output from running 'python -VV' on the command line:

Linked PRs

Fidget-Spinner commented Apr 28, 2024

colesbury commented Apr 30, 2024

colesbury commented Apr 30, 2024

DinoV commented Apr 30, 2024

DinoV commented Apr 30, 2024

colesbury commented Apr 30, 2024

DinoV commented Apr 30, 2024

colesbury commented Apr 30, 2024 • edited Loading

DinoV commented Apr 30, 2024

colesbury commented Apr 30, 2024 • edited Loading

DinoV commented Apr 30, 2024

DinoV commented May 1, 2024

DinoV commented May 1, 2024

colesbury commented May 7, 2024

Fidget-Spinner commented Apr 28, 2024 •

edited by bedevere-app bot

Loading

colesbury commented Apr 30, 2024 •

edited

Loading

colesbury commented Apr 30, 2024 •

edited

Loading