Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isolate the _datetime extension module #117398

Closed
Tracked by #103092
neonene opened this issue Mar 31, 2024 · 31 comments
Closed
Tracked by #103092

Isolate the _datetime extension module #117398

neonene opened this issue Mar 31, 2024 · 31 comments
Labels
extension-modules C modules in the Modules dir topic-subinterpreters type-feature A feature request or enhancement

Comments

@neonene
Copy link
Contributor

neonene commented Mar 31, 2024

Feature or enhancement

Proposal:

I hope this issue will complete _datetime isolation.

My concerns (and answers)
  • Py_MOD_PER_INTERPRETER_GIL_SUPPORTED should be applied in sync with _zoninfo?

    • YES: Possible.
  • Can a module state have a C-API structure, keeping a capsule just for comatibility?

    • NO: Possible, but a user should not touch the module state.
  • C-API supports only the main interpreter? Otherwise, PyInterpreterState is acceptable to point each structure?

    • NO: PyDateTimeAPI cannot emit an error. Also, no PyInterpreterState member is accessible from datetime.h. UPDATE: Seems to be possible by using a global function pointer instead of a function.

Specific issue:

Links to previous discussion of this feature:

Linked PRs (closed)

Linked PRs

@neonene neonene added the type-feature A feature request or enhancement label Mar 31, 2024
@neonene
Copy link
Contributor Author

neonene commented Mar 31, 2024

PR #117399 fails except Windows. Is there an easy way to access a PyInterpreterState member? I'll try later allowing C-API only for the main-interpreter with a global variable.

@erlend-aasland
Copy link
Contributor

I briefly discussed the C API capsule with @pganssle on the 2023 (or was it 2022?) language summit. I also asked the Steering Council to comment about backwards compatibility concerns regarding datetime.h (which is not included by Python.h):

Putting it in the PyInterpreterState struct is an option, but I'm not sure how much we want to bloat that struct. We probably want to deprecate the current capsulated API and introduce a new capsulated API.

vstinner added a commit to vstinner/cpython that referenced this issue May 5, 2024
Move types to the datetime_state structure of the _datetime
extension.
vstinner added a commit to vstinner/cpython that referenced this issue May 5, 2024
Move types to the datetime_state structure of the _datetime
extension.
@vstinner
Copy link
Member

vstinner commented May 5, 2024

Converting the _datetime extension to multiphase initialization and/or converting static types to heap types is complicated because:

  • <Include/datetime.h> C API
  • datetime capsule (C API)
  • If the _datetime extension is reloaded, the C API expects to meet the same types. Otherwise, PyDateTime_Check() fails and everything else fails.

It reminds me the very complicated case of the PyAST C API. We managed to convert the _ast extension to heap types and multiphase init by moving state to the interpreter state. Other "trade offs" attemps for the _ast extension ended by introducing crashes which were hard to trigger and hard to fix.

So I propose this plan to isolate the datetime module:

  • Move references to types to datetime_state structure: I wrote a minimum PR gh-117398: Move types to datetime state #118606 for that
  • Convert static types to heap types, but create them only once, and don't delete them (on purpose, for now).
  • Move datetime_state to PyInterpreterState.
  • At the point, discuss how to deal with remaining issues: multiphase init, C API, capsule, etc.

@neonene
Copy link
Contributor Author

neonene commented May 5, 2024

Where can we see your rationale for the inflating PyInterpreterState with PyAST C-API? IIUC, the practice is supposed to be bad: #103092 (comment).

How do you associate the PyAST issue with PyDateTime in terms of the mechanizm and impact?

@encukou, @ericsnowcurrently

@vstinner
Copy link
Member

vstinner commented May 6, 2024

How do you associate the PyAST issue with PyDateTime in terms of the mechanizm and impact?

Both provide a C API and we wanted to isolate their C extension.

Where can we see your rationale for the inflating PyInterpreterState with PyAST C-API?

I will try to dig into the bug tracker later. In short, there were 2-3 crashes related to the isolation of the _ast extension. Crashes related to the C API.

vstinner added a commit to vstinner/cpython that referenced this issue May 8, 2024
Move types to the datetime_state structure of the _datetime
extension.
@neonene
Copy link
Contributor Author

neonene commented May 10, 2024

Python-ast.c (3.10.0 alpha 0: b1cc6ba)

int PyAST_Check(PyObject* obj)
{
    astmodulestate *state = get_global_ast_state();
    if (state == NULL) {
        return -1;
    }
    return PyObject_IsInstance(obj, state->AST_type);
}

Previously, PyAST_Check() imported the _ast module through get_global_ast_state(). The implicit import could get an unsafe pseudo module by replacing __import__() with a lazy import (strongly discouraged since Python 3.3).

datetime.h

#define PyDate_Check(op) PyObject_TypeCheck((op), PyDateTimeAPI->DateType)
#define PyDate_CheckExact(op) Py_IS_TYPE((op), PyDateTimeAPI->DateType)

PyDate_Check() requires the real module to be imported in advance by calling PyCapsule_Import() explicitly.

At least, the regression case in PyAST C-API above cannot be applied to PyDateTime C-API, I think.

vstinner added a commit to vstinner/cpython that referenced this issue May 10, 2024
Move types to the datetime_state structure of the _datetime
extension.
@vstinner
Copy link
Member

The purpose of this issue is to convert the _datetime extension to the multiphase C API. The problem is to make sure that we are dealing with the same types if multiple instances of the extension are created.

Example:

import sys

import _datetime
now = _datetime.datetime.now()
del _datetime
del sys.modules['_datetime']

import _datetime
print(isinstance(now, _datetime.datetime))

In general, we don't require this. For example, it's false with the _random extension:

import sys

import _random
rng = _random.Random()
del _random
del sys.modules['_random']

import _random
print(isinstance(rng, _random.Random))

The problem for _datetime is the C API and the capsule. Currently, the C API is based on PyDateTimeAPI variable (one variable is created every time the datetime.h header is included.) I would prefer PyDateTime_Check() to return true even if there are two instances of the datetime module.

I would care less if the C API would be stateful: pass explicitly a module/state/something. But changing the C API is out of the scope of this issue.

@erlend-aasland
Copy link
Contributor

But changing the C API is out of the scope of this issue.

The SC said that the datetime C API is protected by PEP-387, even though datetime.h is not explicitly included in Python.h. IMO, we should deprecate the current C API and introduce a new one. That is the easiest solution technically, and the easiest to relate to for users. I agree with Victor that it deserves its own issue (and perhaps also a Discourse topic).

vstinner added a commit that referenced this issue May 10, 2024
Move types to the datetime_state structure of the _datetime
extension.
@vstinner
Copy link
Member

I merged my first change "Move types to datetime state".

@neonene: Do you want to propose changes to convert static types to heap types? Maybe start with a PR to convert only 4 types to make the PR short enough so it's easier to review.

Types:

    PyTypeObject *date_type;
    PyTypeObject *datetime_type;
    PyTypeObject *delta_type;
    PyTypeObject *isocalendar_date_type;
    PyTypeObject *time_type;
    PyTypeObject *tzinfo_type;
    PyTypeObject *timezone_type;

@vstinner
Copy link
Member

IMO, we should deprecate the current C API and introduce a new one.

If we manage to isolate the _datetime extension without breaking the C API, I'm not sure that it's needed.

@neonene
Copy link
Contributor Author

neonene commented May 10, 2024

I'll wait python/steering-council#243, which is a necessary procedure.

@ncoghlan
Copy link
Contributor

ncoghlan commented May 12, 2024

A key functional test case in relation to python/steering-council#243 is ensuring that previous references to the capsule API still work after a module reload. Specifically, this sequence:

  1. capsule reference is acquired
  2. module is reloaded
  3. capsule reference is used

This will have historically worked due to the implicit module caching in the single-phase init support for extension modules, but I'm concerned that a full migration to true reload support will break it (and likely crash the interpreter in the process).

I'd also be genuinely surprised if such a test case already exists, hence the green CI on #118337 may not be enough to show that the migration to multi-phase init hasn't broken anything.

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jun 3, 2024
…ypes (pythongh-119929)

We make use of the same mechanism that we use for the static builtin types.  This required a few tweaks.

The relevant code could use some cleanup but I opted to avoid the significant churn in this change.  I'll tackle that separately.

This change is the final piece needed to make _datetime support multiple interpreters.  I've updated the module slot accordingly.
(cherry picked from commit 105f22e)

Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
ericsnowcurrently added a commit that referenced this issue Jun 3, 2024
…Types (gh-120009)

We make use of the same mechanism that we use for the static builtin types.  This required a few tweaks.

This change is the final piece needed to make _datetime support multiple interpreters.  I've updated the module slot accordingly.

(cherry picked from commit 105f22e, AKA gh-119929)

Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
barneygale pushed a commit to barneygale/cpython that referenced this issue Jun 5, 2024
I was able to make use of the existing datetime_state struct, but there was one tricky thing I had to sort out.  We mostly aren't converting to heap types, so we can't use things like PyType_GetModuleByDef() to look up the module state.  The solution I came up with is somewhat novel, but I consider it straightforward.  Also, it shouldn't have much impact on performance.

In summary, this main changes here are:

* I've added some macros to help hide how various objects relate to module state
* as a solution to the module state lookup problem, I've stored the last loaded module on the current interpreter's internal dict (actually a weakref)
* if the static type method is used after the module has been deleted, it is reloaded
* to avoid extra work when loading the module, we directly copy the objects (new refs only) from the old module state into the new state if the old module hasn't been deleted yet
* during module init we set various objects on the static types' __dict__s; to simplify things, we only do that the first time; once those static types have a separate __dict__ per interpreter, we'll do it every time
* we now clear the module state when the module is destroyed (before, we were leaking everything in _datetime_global_state)
barneygale pushed a commit to barneygale/cpython that referenced this issue Jun 5, 2024
…ypes (pythongh-119929)

We make use of the same mechanism that we use for the static builtin types.  This required a few tweaks.

The relevant code could use some cleanup but I opted to avoid the significant churn in this change.  I'll tackle that separately.

This change is the final piece needed to make _datetime support multiple interpreters.  I've updated the module slot accordingly.
ericsnowcurrently pushed a commit that referenced this issue Jun 13, 2024
…-119604)

Check if the DateTime C-API type matches the datetime.date type on main and shared/isolated subinterpreters.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jun 13, 2024
…rs (pythongh-119604)

Check if the DateTime C-API type matches the datetime.date type on main and shared/isolated subinterpreters.
(cherry picked from commit 50a3895)

Co-authored-by: neonene <53406459+neonene@users.noreply.github.com>
ericsnowcurrently pushed a commit that referenced this issue Jun 13, 2024
…ers (gh-120463)

Check if the DateTime C-API type matches the datetime.date type on main and shared/isolated subinterpreters.

(cherry picked from commit 50a3895, AKA gh-119604)

Co-authored-by: neonene <53406459+neonene@users.noreply.github.com>
freakboy3742 added a commit that referenced this issue Jun 15, 2024
@neonene neonene closed this as completed Jun 15, 2024
mrahtz pushed a commit to mrahtz/cpython that referenced this issue Jun 30, 2024
…rs (pythongh-119604)

Check if the DateTime C-API type matches the datetime.date type on main and shared/isolated subinterpreters.
mrahtz pushed a commit to mrahtz/cpython that referenced this issue Jun 30, 2024
freakboy3742 pushed a commit that referenced this issue Jul 10, 2024
…API test (GH-120477) (#121561)

Use the correct binary module loader for iOS.
noahbkim pushed a commit to hudson-trading/cpython that referenced this issue Jul 11, 2024
…hongh-119637)

This is the only static type in the module that we will not keep static.
noahbkim pushed a commit to hudson-trading/cpython that referenced this issue Jul 11, 2024
I was able to make use of the existing datetime_state struct, but there was one tricky thing I had to sort out.  We mostly aren't converting to heap types, so we can't use things like PyType_GetModuleByDef() to look up the module state.  The solution I came up with is somewhat novel, but I consider it straightforward.  Also, it shouldn't have much impact on performance.

In summary, this main changes here are:

* I've added some macros to help hide how various objects relate to module state
* as a solution to the module state lookup problem, I've stored the last loaded module on the current interpreter's internal dict (actually a weakref)
* if the static type method is used after the module has been deleted, it is reloaded
* to avoid extra work when loading the module, we directly copy the objects (new refs only) from the old module state into the new state if the old module hasn't been deleted yet
* during module init we set various objects on the static types' __dict__s; to simplify things, we only do that the first time; once those static types have a separate __dict__ per interpreter, we'll do it every time
* we now clear the module state when the module is destroyed (before, we were leaking everything in _datetime_global_state)
noahbkim pushed a commit to hudson-trading/cpython that referenced this issue Jul 11, 2024
…ypes (pythongh-119929)

We make use of the same mechanism that we use for the static builtin types.  This required a few tweaks.

The relevant code could use some cleanup but I opted to avoid the significant churn in this change.  I'll tackle that separately.

This change is the final piece needed to make _datetime support multiple interpreters.  I've updated the module slot accordingly.
noahbkim pushed a commit to hudson-trading/cpython that referenced this issue Jul 11, 2024
…rs (pythongh-119604)

Check if the DateTime C-API type matches the datetime.date type on main and shared/isolated subinterpreters.
noahbkim pushed a commit to hudson-trading/cpython that referenced this issue Jul 11, 2024
estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
This is minimal support.  Subinterpreters are not supported yet.  That will be addressed in a later change.

Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
…don't free it (pythonGH-119662)

- While datetime uses global state, only initialize it once.
- While `capi` is static, don't free it (thanks @neonene in https://github.com/python/cpython/pull/119641/files#r1616710048)
estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
…hongh-119637)

This is the only static type in the module that we will not keep static.
estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
I was able to make use of the existing datetime_state struct, but there was one tricky thing I had to sort out.  We mostly aren't converting to heap types, so we can't use things like PyType_GetModuleByDef() to look up the module state.  The solution I came up with is somewhat novel, but I consider it straightforward.  Also, it shouldn't have much impact on performance.

In summary, this main changes here are:

* I've added some macros to help hide how various objects relate to module state
* as a solution to the module state lookup problem, I've stored the last loaded module on the current interpreter's internal dict (actually a weakref)
* if the static type method is used after the module has been deleted, it is reloaded
* to avoid extra work when loading the module, we directly copy the objects (new refs only) from the old module state into the new state if the old module hasn't been deleted yet
* during module init we set various objects on the static types' __dict__s; to simplify things, we only do that the first time; once those static types have a separate __dict__ per interpreter, we'll do it every time
* we now clear the module state when the module is destroyed (before, we were leaking everything in _datetime_global_state)
estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
…ypes (pythongh-119929)

We make use of the same mechanism that we use for the static builtin types.  This required a few tweaks.

The relevant code could use some cleanup but I opted to avoid the significant churn in this change.  I'll tackle that separately.

This change is the final piece needed to make _datetime support multiple interpreters.  I've updated the module slot accordingly.
estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
…rs (pythongh-119604)

Check if the DateTime C-API type matches the datetime.date type on main and shared/isolated subinterpreters.
estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir topic-subinterpreters type-feature A feature request or enhancement
Projects
Status: Done
Development

No branches or pull requests

6 participants