Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-30747: Attempt to fix atomic load/store #2383

Merged
merged 1 commit into from
Aug 12, 2017
Merged

Conversation

Paxxi
Copy link
Contributor

@Paxxi Paxxi commented Jun 24, 2017

Py_atomic* are currently not implemented as atomic operations
when building with MSVC. This patch attempts to implement parts
of the functionality required.

https://bugs.python.org/issue30747

@mention-bot
Copy link

@Paxxi, thanks for your PR! By analyzing the history of the files in this pull request, we identified @jyasskin, @benjaminp and @akheron to be potential reviewers.

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately our records indicate you have not signed the CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

Thanks again to your contribution and we look forward to looking at it!

@pitrou
Copy link
Member

pitrou commented Jun 29, 2017

It seems you broke the build on Unix:
https://travis-ci.org/python/cpython/jobs/246629729#L1145

Parser/myreadline.c:311:33: error: implicit declaration of function '_Py_atomic_load_relaxed' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
    if (_PyOS_ReadlineTState == PyThreadState_GET()) {
                                ^
./Include/pystate.h:252:31: note: expanded from macro 'PyThreadState_GET'
             ((PyThreadState*)_Py_atomic_load_relaxed(&_PyThreadState_Current))
                              ^

@@ -87,8 +93,9 @@ typedef struct _Py_atomic_int {
|| (ORDER) == __ATOMIC_CONSUME), \
__atomic_load_n(&(ATOMIC_VAL)->_value, ORDER))

#else

/* Only support GCC (for expression statements) and x86 (for simple
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps remove this comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of updating it to reflect the current state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you forgot to update it :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed now

@@ -230,8 +321,6 @@ _Py_ANNOTATE_MEMORY_ORDER(const volatile void *address, _Py_memory_order order)
#define _Py_atomic_load_explicit(ATOMIC_VAL, ORDER) \
((ATOMIC_VAL)->_value)

#endif /* !gcc x86 */
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you shouldn't have moved the second #endif below.

#define _Py_atomic_store_64bit(ATOMIC_VAL, NEW_VAL, ORDER) \
switch (ORDER) { \
case _Py_memory_order_acquire: \
_InterlockedExchange64_HLEAcquire((__int64 volatile*)ATOMIC_VAL, (__int64)NEW_VAL); \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a runtime check whether HLE is available or not, or is it a compile-time check? If compile-time, it means we may mistakingly produce builds that won't run on non-HLE processors. @zooba

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the documentation these will use HLEAcquire if the cpu supports it or use a regular _InterlockedExchange64 docs here https://msdn.microsoft.com/en-us/library/1s26w950.aspx

specifically this bit

On Intel platforms that support Hardware Lock Elision (HLE) instructions, the intrinsics with _HLEAcquire and _HLERelease suffixes include a hint to the processor that can accelerate performance by eliminating a lock write step in hardware. If these intrinsics are called on platforms that do not support HLE, the hint is ignored.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that. My question is whether the CPU detection happens at compile time or at run time. I can't find any information online...

Copy link
Contributor Author

@Paxxi Paxxi Jun 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no detection happening. Intel seem to have found a way to make it backwards compatible so older cpus will ignore the extra hint.

I tested this quickly and here's the generated assembly for the HLE and non HLE versions

; 12   :   auto a = _InterlockedExchange_HLEAcquire(&test, 1);

mov	 eax, 1
lea	 ecx, DWORD PTR _test$[ebp]
xacquire  xchg	   DWORD PTR [ecx], eax
mov	 DWORD PTR _a$[ebp], eax

; 13   :   auto b = _InterlockedExchange(&test, 2);

mov	 eax, 2
lea	 ecx, DWORD PTR _test$[ebp]
xchg	 DWORD PTR [ecx], eax
mov	 DWORD PTR _b$[ebp], eax

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great, thank you!

}

#define _Py_atomic_signal_fence(/*memory_order*/ ORDER) ((void)0)
#define _Py_atomic_thread_fence(/*memory_order*/ ORDER) ((void)0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need to copy what the GCC version does for these two functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure yet, I'm thinking that it's not required but I'm looking into it further to be sure.

_Py_atomic_store_32bit(ATOMIC_VAL._value, NEW_VAL, ORDER) }

#define _Py_atomic_load_explicit(ATOMIC_VAL, ORDER) \
((ATOMIC_VAL)->_value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for the fence functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working on it. Will be updating this PR with the load implementation and the bits for MSVC/ARM as well. Had to spend this week debugging another issue so I'm planning on having everything done around Monday/Tuesday next week.

@Paxxi
Copy link
Contributor Author

Paxxi commented Jul 5, 2017

This should be ready now. Test suite passes for x86/x64, completely untested for ARM as I don't have any suitable environment.

I don't really know how to validate that they're correctly atomic. The operations aren't really suitable to implement a ref count or a mutex as a test case so I'm open to ideas of any way to test this.

@pitrou
Copy link
Member

pitrou commented Jul 8, 2017

I don't really know how to validate that they're correctly atomic.

I don't think we need to. The potential issues here are extreme edge case race conditions that may happen once a month on heavily-loaded production systems. There's little we can check for in the test suite, IMHO.

Longer term, one possibility would be to build Python with thread sanitizer. That's not compatible with MSVC AFAIK, though.

@pitrou
Copy link
Member

pitrou commented Jul 8, 2017

@Paxxi, partially related, but have you seen PR #2417 and do you have any opinion about it?

}

#define _Py_atomic_signal_fence(/*memory_order*/ ORDER) ((void)0)
#define _Py_atomic_thread_fence(/*memory_order*/ ORDER) ((void)0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On gcc/x86, _Py_atomic_thread_fence issues a mfence assembler instruction. Why doesn't it do the same here? Since it's the same CPU architecture, it should probably issue the same instruction...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And apparently MSVC has a MemoryBarrier macro.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These macros aren't used anywhere in the code base except in pyatomic.h for the gcc implementation so I skipped them.

Can implement them using the MemoryBarrier macro if required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Then it's simpler to remove them, I think.

#define _Py_atomic_thread_fence(/*memory_order*/ ORDER) ((void)0)

#define _Py_atomic_store_explicit(ATOMIC_VAL, NEW_VAL, ORDER) \
if (sizeof(*ATOMIC_VAL._value) == 8) { \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of pseudo-runtime checks, you may want to use a true compile-time check using e.g. SIZEOF_VOID_P.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps that can help you get rid of some dummy definitions, by the way (for example the 64-bit load/store macros on 32-bit builds).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't do it at compile time since the macro is used for both _Py_atomic_address and _Py_atomic_int which will be different sizes on 64-bit Windows. int will still be 4 bytes so the runtime check is required to handle it correctly.



#if defined(_M_X64)
#define _Py_atomic_store_64bit(ATOMIC_VAL, NEW_VAL, ORDER) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think the load and store macros are ok as defined, but that's because MSDN says "Most of the interlocked functions provide full memory barriers on all Windows platforms". Perhaps you can add a comment to that effect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

@pitrou
Copy link
Member

pitrou commented Jul 9, 2017

@zooba, do you think you can get some expert at Microsoft to take a look at this?
Otherwise, while I can't vouch for the correctness, this can't be worse than the statu quo, as long as it compiles fine :-)

@Paxxi
Copy link
Contributor Author

Paxxi commented Jul 13, 2017

Added comment about interlocked functions and removed the fence macros as they're not used.

_Py_atomic_load_64bit(ATOMIC_VAL._value, ORDER) : \
_Py_atomic_load_32bit(ATOMIC_VAL._value, ORDER) \
)
_WriteBarrier
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure about this? According to MSDN, those barrier intrinsics are only defined on x86 and x86-64.

Also, the macro doesn't look like it would expand correctly...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No that's a mistake, no idea how that snuck in there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@pitrou
Copy link
Member

pitrou commented Jul 17, 2017

@zooba, do you want to review this pre-merge or should it go ahead?

@pitrou
Copy link
Member

pitrou commented Aug 6, 2017

I think this is ready to merge. Would you like to add a NEWS entry using the "blurb" utility?

@Paxxi
Copy link
Contributor Author

Paxxi commented Aug 7, 2017

I'm not much of a writer so I'd rather skip writing anything about it. Feel free to do it on my behalf if it's considered newsworthy enough.

@pitrou
Copy link
Member

pitrou commented Aug 8, 2017

Well, I'm not sure how to edit your branch, perhaps you can try applying the following patch to it:

$ git diff --cached 
diff --git a/Misc/NEWS.d/next/Core and Builtins/2017-08-08-12-00-29.bpo-30747.g2kZRT.rst b/Misc/NEWS.d/next/Core and Builtins/2017-08-08-12-00-29.bpo-30747.g2kZRT.rst
new file mode 100644
index 0000000000..04a726a7e6
--- /dev/null
+++ b/Misc/NEWS.d/next/Core and Builtins/2017-08-08-12-00-29.bpo-30747.g2kZRT.rst       
@@ -0,0 +1,2 @@
+Add a non-dummy implementation of _Py_atomic_store and _Py_atomic_load on
+MSVC.

_Py_atomic_* are currently not implemented as atomic operations
when building with MSVC. This patch attempts to implement parts
of the functionality required.
@Paxxi
Copy link
Contributor Author

Paxxi commented Aug 8, 2017

Thanks! I added it to the commit now.

Default settings on Github afaik is that repo owners can push directly to a contributors branch for a PR but I didn't know it needed to be part of the commit.

Should be all set now.

@pitrou
Copy link
Member

pitrou commented Aug 12, 2017

Ok, thank you!

@pitrou pitrou merged commit e664d7f into python:master Aug 12, 2017
segevfiner added a commit to segevfiner/cpython that referenced this pull request Aug 18, 2017
Apparently MSVC is too stupid to understand that the alternate branch is
not taken and emits a warning for it.

Warnings added in python#2383
pitrou pushed a commit that referenced this pull request Aug 20, 2017
* bpo-9566: Silence warnings from pyatomic.h macros

Apparently MSVC is too stupid to understand that the alternate branch is
not taken and emits a warning for it.

Warnings added in #2383

* bpo-9566: A better fix for the pyatomic.h warning

* bpo-9566: Remove a slash
GadgetSteve pushed a commit to GadgetSteve/cpython that referenced this pull request Sep 10, 2017
…#3140)

* bpo-9566: Silence warnings from pyatomic.h macros

Apparently MSVC is too stupid to understand that the alternate branch is
not taken and emits a warning for it.

Warnings added in python#2383

* bpo-9566: A better fix for the pyatomic.h warning

* bpo-9566: Remove a slash
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants