Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-103323: Get the "Current" Thread State from a Thread-Local Variable #103324

Merged

Conversation

ericsnowcurrently
Copy link
Member

@ericsnowcurrently ericsnowcurrently commented Apr 6, 2023

We replace _PyRuntime.tstate_current with a thread-local variable. As part of this change, we add a _Py_thread_local macro in pyport.h (only for the core runtime) to smooth out the compiler differences. The main motivation here is in support of a per-interpreter GIL, but this change also provides some performance improvement opportunities.

Note that we do not provide a fallback to the thread-local, either falling back to the old tstate_current or to thread-specific storage (PyThread_tss_*()). If that proves problematic then we can circle back. I consider it unlikely, but will run the buildbots to double-check.

Also note that this does not change any of the code related to the GILState API, where it uses a thread state stored in thread-specific storage. I suspect we can combine that with _Py_tss_tstate (from here). However, that can be addressed separately and is not urgent (nor critical).

My only remaining uncertainty is with the existing "GIL is held" constraint. With _PyRuntime.tstate_current, it was only guaranteed valid in the thread currently holding the GIL, if any. With this change, it is valid even when the GIL isn't held. I don't see how that would be a problem, but I'm going to double-check anyway.

(While this change was mostly done independently, I did take some inspiration from earlier (~2020) work by @markshannon (main...markshannon:threadstate_in_tls) and @vstinner (#23976).)

@ericsnowcurrently
Copy link
Member Author

ericsnowcurrently commented Apr 7, 2023

Per the benchmarks, this change is a little faster (less than 1%) on Linux/GCC.

@ericsnowcurrently ericsnowcurrently marked this pull request as ready for review April 7, 2023 18:14
@ericsnowcurrently ericsnowcurrently added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 7, 2023
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @ericsnowcurrently for commit feb8ef5 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 7, 2023
@markshannon
Copy link
Member

It might be worth trying to see what is the performance impact of storing the interpreter state in TLS as well.
No need to do that in this PR, though.

@ericsnowcurrently ericsnowcurrently merged commit f8abfa3 into python:main Apr 24, 2023
@ericsnowcurrently ericsnowcurrently deleted the tstate_current-as-thread_local branch April 24, 2023 17:17
carljm added a commit to carljm/cpython that referenced this pull request Apr 24, 2023
* main: (53 commits)
  pythongh-102498 Clean up unused variables and imports in the email module  (python#102482)
  pythongh-99184: Bypass instance attribute access in `repr` of `weakref.ref` (python#99244)
  pythongh-99032: datetime docs: Encoding is no longer relevant (python#93365)
  pythongh-94300: Update datetime.strptime documentation (python#95318)
  pythongh-103776: Remove explicit uses of $(SHELL) from Makefile (pythonGH-103778)
  pythongh-87092: fix a few cases of incorrect error handling in compiler (python#103456)
  pythonGH-103727: Avoid advancing tokenizer too far in f-string mode (pythonGH-103775)
  Revert "Add tests for empty range equality (python#103751)" (python#103770)
  pythongh-94518: Port 23-argument `_posixsubprocess.fork_exec` to Argument Clinic (python#94519)
  pythonGH-65022: Fix description of copyreg.pickle function (python#102656)
  pythongh-103323: Get the "Current" Thread State from a Thread-Local Variable (pythongh-103324)
  pythongh-91687: modernize dataclass example typing (python#103773)
  pythongh-103746: Test `types.UnionType` and `Literal` types together (python#103747)
  pythongh-103765: Fix 'Warning: py:class reference target not found: ModuleSpec' (pythonGH-103769)
  pythongh-87452: Improve the Popen.returncode docs
  Removed unnecessary escaping of asterisks (python#103714)
  pythonGH-102973: Slim down Fedora packages in the dev container (python#103283)
  pythongh-103091: Add PyUnstable_Type_AssignVersionTag (python#103095)
  Add tests for empty range equality (python#103751)
  pythongh-103712: Increase the length of the type name in AttributeError messages (python#103713)
  ...
carljm added a commit to carljm/cpython that referenced this pull request Apr 24, 2023
* superopt: (82 commits)
  pythongh-101517: fix line number propagation in code generated for except* (python#103550)
  pythongh-103780: Use patch instead of mock in asyncio unix events test (python#103782)
  pythongh-102498 Clean up unused variables and imports in the email module  (python#102482)
  pythongh-99184: Bypass instance attribute access in `repr` of `weakref.ref` (python#99244)
  pythongh-99032: datetime docs: Encoding is no longer relevant (python#93365)
  pythongh-94300: Update datetime.strptime documentation (python#95318)
  pythongh-103776: Remove explicit uses of $(SHELL) from Makefile (pythonGH-103778)
  pythongh-87092: fix a few cases of incorrect error handling in compiler (python#103456)
  pythonGH-103727: Avoid advancing tokenizer too far in f-string mode (pythonGH-103775)
  Revert "Add tests for empty range equality (python#103751)" (python#103770)
  pythongh-94518: Port 23-argument `_posixsubprocess.fork_exec` to Argument Clinic (python#94519)
  pythonGH-65022: Fix description of copyreg.pickle function (python#102656)
  pythongh-103323: Get the "Current" Thread State from a Thread-Local Variable (pythongh-103324)
  pythongh-91687: modernize dataclass example typing (python#103773)
  pythongh-103746: Test `types.UnionType` and `Literal` types together (python#103747)
  pythongh-103765: Fix 'Warning: py:class reference target not found: ModuleSpec' (pythonGH-103769)
  pythongh-87452: Improve the Popen.returncode docs
  Removed unnecessary escaping of asterisks (python#103714)
  pythonGH-102973: Slim down Fedora packages in the dev container (python#103283)
  pythongh-103091: Add PyUnstable_Type_AssignVersionTag (python#103095)
  ...
@ericsnowcurrently ericsnowcurrently restored the tstate_current-as-thread_local branch April 25, 2023 15:59
@ericsnowcurrently
Copy link
Member Author

FTR, on Windows this introduced a ~2% performance regression, and on MacOS there's ~3% regression.

Note that these penalties may be partially mitigated by passing the current thread state as an argument throughout the internal C-API (where currently we only do so in some places). The implementation here is also relatively naïve. There are likely opportunities to improve performance via compiler-specific directives.

@ericsnowcurrently ericsnowcurrently deleted the tstate_current-as-thread_local branch April 25, 2023 20:52
static inline PyThreadState*
_PyRuntimeState_GetThreadState(_PyRuntimeState *Py_UNUSED(runtime))
{
return _PyThreadState_GET();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function no longer makes sense: I wrote PR #104171 to remove it.

@vstinner
Copy link
Member

vstinner commented May 4, 2023

(While this change was mostly done independently, I did take some inspiration from earlier (~2020) work by @markshannon (main...markshannon:threadstate_in_tls) and @vstinner (#23976).)

I prepared this change for years in advance:

  • Python 3.8:

    • I modified the PyThreadState_GET() macro to make it an alias to PyThreadState_Get() function: so it's always a function call, and no longer a macro to hide implementation details.
    • I added _PyThreadState_GET() static inline function to the internal C API to get more freedom on its implementation.
    • I modified many internal C functions to pass explicitly the tstate variable: Pass the Python thread state explicitly. For example, I added _PyErr_Occurred(tstate) function which replaces PyErr_Occurred() (which has no argument). The idea is that in the future, calling _PyThreadState_GET() may before slower if the value is retreived from a thread local storage (TLS) variable: which is done in this PR.
    • I modified many functions to pass the "state" more explicitly: runtime, tstate and/or interp. Example: _PySys_Create(runtime, interp, &sysmod) call in Python/pylifecycle.c.
  • Python 3.9:

    • I converted _PyThreadState_GET() macro to a static inline function.
    • I added _PyInterpreterState_GET() static inline function. Maybe the implementation will change in the future to be more efficient (ex: thread local storage?).

@vstinner
Copy link
Member

vstinner commented May 4, 2023

My only remaining uncertainty is with the existing "GIL is held" constraint. With _PyRuntime.tstate_current, it was only guaranteed valid in the thread currently holding the GIL, if any. With this change, it is valid even when the GIL isn't held. I don't see how that would be a problem, but I'm going to double-check anyway.

In the Python 3.9 and 3.10 era, I moved multiple global states to the "interpreter state" (interp): https://pythondev.readthedocs.io/subinterpreters.html#done These changes caused various crashes in third party C extensions which use the C API with the GIL released (!). For example, calling PyLong_FromLong(1) with the GIL released. This was always illegal and invalid according to the C API documentation. But you know, there are always bugs in the wild. All affected C extensions have been fixed in the meanwhile. Also, some states were made global again (small integer singletons), and immortal objects also made the situation differnet.

@vstinner
Copy link
Member

vstinner commented May 4, 2023

FTR, on Windows this introduced a ~2% performance regression, and on MacOS there's ~3% regression.

It might be interesting to check the hot code calling _PyThreadState_GET() and see if tstate could be passed to only call _PyThreadState_GET() once. I'm not sure if it's worth it. Also, in stdlib C extensions, I would prefer to use the internal C API less rather than more :-)

@@ -663,6 +663,27 @@ extern char * _getpty(int *, int, mode_t, int);
# define WITH_THREAD
#endif

#ifdef WITH_THREAD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is useless. This macro is now always defined. It's only kept for backward compatibility.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't it affect WASM builds?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the code 3 lines above:

#ifndef WITH_THREAD
#  define WITH_THREAD
#endif

#if defined(HAVE_THREAD_LOCAL) && !defined(Py_BUILD_CORE_MODULE)
extern _Py_thread_local PyThreadState *_Py_tss_tstate;
#endif
PyAPI_DATA(PyThreadState *) _PyThreadState_GetCurrent(void);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case for this new _PyThreadState_GetCurrent() function? There is already PyThreadState_Get(). How is it different?

The API to get the current thread state is already complicated and has a complicated history: https://pythondev.readthedocs.io/pystate.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's there because I couldn't find a way to mix PyAPI_DATA with _Py_thread_local. Looks like that's the same issue you ran into in 2020.

@vstinner
Copy link
Member

vstinner commented May 4, 2023

Does this change fix indirectly the PyGILState API for subinterpreters? See: #59956

@vstinner
Copy link
Member

vstinner commented May 4, 2023

Thanks @ericsnowcurrently for taking care of this very old project!

It seems like the !defined(Py_BUILD_CORE_MODULE) test in pycore_pystate.h avoids the complicated linker issuses that I had on Windows and macOS when I tried a similar change in 2020 (PR #23976).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants