-
-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-30747: Attempt to fix atomic load/store #2383
Conversation
@Paxxi, thanks for your PR! By analyzing the history of the files in this pull request, we identified @jyasskin, @benjaminp and @akheron to be potential reviewers. |
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA). Unfortunately our records indicate you have not signed the CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. Thanks again to your contribution and we look forward to looking at it! |
It seems you broke the build on Unix:
|
@@ -87,8 +93,9 @@ typedef struct _Py_atomic_int { | |||
|| (ORDER) == __ATOMIC_CONSUME), \ | |||
__atomic_load_n(&(ATOMIC_VAL)->_value, ORDER)) | |||
|
|||
#else | |||
|
|||
/* Only support GCC (for expression statements) and x86 (for simple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps remove this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of updating it to reflect the current state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you forgot to update it :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed now
Include/pyatomic.h
Outdated
@@ -230,8 +321,6 @@ _Py_ANNOTATE_MEMORY_ORDER(const volatile void *address, _Py_memory_order order) | |||
#define _Py_atomic_load_explicit(ATOMIC_VAL, ORDER) \ | |||
((ATOMIC_VAL)->_value) | |||
|
|||
#endif /* !gcc x86 */ | |||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you shouldn't have moved the second #endif
below.
#define _Py_atomic_store_64bit(ATOMIC_VAL, NEW_VAL, ORDER) \ | ||
switch (ORDER) { \ | ||
case _Py_memory_order_acquire: \ | ||
_InterlockedExchange64_HLEAcquire((__int64 volatile*)ATOMIC_VAL, (__int64)NEW_VAL); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a runtime check whether HLE is available or not, or is it a compile-time check? If compile-time, it means we may mistakingly produce builds that won't run on non-HLE processors. @zooba
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the documentation these will use HLEAcquire if the cpu supports it or use a regular _InterlockedExchange64 docs here https://msdn.microsoft.com/en-us/library/1s26w950.aspx
specifically this bit
On Intel platforms that support Hardware Lock Elision (HLE) instructions, the intrinsics with _HLEAcquire and _HLERelease suffixes include a hint to the processor that can accelerate performance by eliminating a lock write step in hardware. If these intrinsics are called on platforms that do not support HLE, the hint is ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that. My question is whether the CPU detection happens at compile time or at run time. I can't find any information online...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no detection happening. Intel seem to have found a way to make it backwards compatible so older cpus will ignore the extra hint.
I tested this quickly and here's the generated assembly for the HLE and non HLE versions
; 12 : auto a = _InterlockedExchange_HLEAcquire(&test, 1);
mov eax, 1
lea ecx, DWORD PTR _test$[ebp]
xacquire xchg DWORD PTR [ecx], eax
mov DWORD PTR _a$[ebp], eax
; 13 : auto b = _InterlockedExchange(&test, 2);
mov eax, 2
lea ecx, DWORD PTR _test$[ebp]
xchg DWORD PTR [ecx], eax
mov DWORD PTR _b$[ebp], eax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great, thank you!
Include/pyatomic.h
Outdated
} | ||
|
||
#define _Py_atomic_signal_fence(/*memory_order*/ ORDER) ((void)0) | ||
#define _Py_atomic_thread_fence(/*memory_order*/ ORDER) ((void)0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need to copy what the GCC version does for these two functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure yet, I'm thinking that it's not required but I'm looking into it further to be sure.
Include/pyatomic.h
Outdated
_Py_atomic_store_32bit(ATOMIC_VAL._value, NEW_VAL, ORDER) } | ||
|
||
#define _Py_atomic_load_explicit(ATOMIC_VAL, ORDER) \ | ||
((ATOMIC_VAL)->_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as for the fence functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm working on it. Will be updating this PR with the load implementation and the bits for MSVC/ARM as well. Had to spend this week debugging another issue so I'm planning on having everything done around Monday/Tuesday next week.
This should be ready now. Test suite passes for x86/x64, completely untested for ARM as I don't have any suitable environment. I don't really know how to validate that they're correctly atomic. The operations aren't really suitable to implement a ref count or a mutex as a test case so I'm open to ideas of any way to test this. |
I don't think we need to. The potential issues here are extreme edge case race conditions that may happen once a month on heavily-loaded production systems. There's little we can check for in the test suite, IMHO. Longer term, one possibility would be to build Python with thread sanitizer. That's not compatible with MSVC AFAIK, though. |
Include/pyatomic.h
Outdated
} | ||
|
||
#define _Py_atomic_signal_fence(/*memory_order*/ ORDER) ((void)0) | ||
#define _Py_atomic_thread_fence(/*memory_order*/ ORDER) ((void)0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On gcc/x86, _Py_atomic_thread_fence
issues a mfence
assembler instruction. Why doesn't it do the same here? Since it's the same CPU architecture, it should probably issue the same instruction...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And apparently MSVC has a MemoryBarrier
macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These macros aren't used anywhere in the code base except in pyatomic.h for the gcc implementation so I skipped them.
Can implement them using the MemoryBarrier macro if required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Then it's simpler to remove them, I think.
Include/pyatomic.h
Outdated
#define _Py_atomic_thread_fence(/*memory_order*/ ORDER) ((void)0) | ||
|
||
#define _Py_atomic_store_explicit(ATOMIC_VAL, NEW_VAL, ORDER) \ | ||
if (sizeof(*ATOMIC_VAL._value) == 8) { \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of pseudo-runtime checks, you may want to use a true compile-time check using e.g. SIZEOF_VOID_P
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps that can help you get rid of some dummy definitions, by the way (for example the 64-bit load/store macros on 32-bit builds).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't do it at compile time since the macro is used for both _Py_atomic_address
and _Py_atomic_int
which will be different sizes on 64-bit Windows. int will still be 4 bytes so the runtime check is required to handle it correctly.
|
||
|
||
#if defined(_M_X64) | ||
#define _Py_atomic_store_64bit(ATOMIC_VAL, NEW_VAL, ORDER) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think the load and store macros are ok as defined, but that's because MSDN says "Most of the interlocked functions provide full memory barriers on all Windows platforms". Perhaps you can add a comment to that effect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
@zooba, do you think you can get some expert at Microsoft to take a look at this? |
Added comment about interlocked functions and removed the fence macros as they're not used. |
Include/pyatomic.h
Outdated
_Py_atomic_load_64bit(ATOMIC_VAL._value, ORDER) : \ | ||
_Py_atomic_load_32bit(ATOMIC_VAL._value, ORDER) \ | ||
) | ||
_WriteBarrier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure about this? According to MSDN, those barrier intrinsics are only defined on x86 and x86-64.
Also, the macro doesn't look like it would expand correctly...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No that's a mistake, no idea how that snuck in there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@zooba, do you want to review this pre-merge or should it go ahead? |
I think this is ready to merge. Would you like to add a NEWS entry using the "blurb" utility? |
I'm not much of a writer so I'd rather skip writing anything about it. Feel free to do it on my behalf if it's considered newsworthy enough. |
Well, I'm not sure how to edit your branch, perhaps you can try applying the following patch to it:
|
_Py_atomic_* are currently not implemented as atomic operations when building with MSVC. This patch attempts to implement parts of the functionality required.
Thanks! I added it to the commit now. Default settings on Github afaik is that repo owners can push directly to a contributors branch for a PR but I didn't know it needed to be part of the commit. Should be all set now. |
Ok, thank you! |
Apparently MSVC is too stupid to understand that the alternate branch is not taken and emits a warning for it. Warnings added in python#2383
…#3140) * bpo-9566: Silence warnings from pyatomic.h macros Apparently MSVC is too stupid to understand that the alternate branch is not taken and emits a warning for it. Warnings added in python#2383 * bpo-9566: A better fix for the pyatomic.h warning * bpo-9566: Remove a slash
Py_atomic* are currently not implemented as atomic operations
when building with MSVC. This patch attempts to implement parts
of the functionality required.
https://bugs.python.org/issue30747