bpo-37802: Slightly improve perfomance of PyLong_FromSize_t() #15192

sir-sigurd · 2019-08-09T13:30:04Z

https://bugs.python.org/issue37802

Objects/longobject.c

Misc/NEWS.d/next/Core and Builtins/2019-08-09-18-28-57.bpo-37802.pKxcAW.rst

mdickinson

LGTM

gnprice · 2019-08-10T20:51:15Z

BTW, I think the behavior of CHECK_SMALL_INT with its implicit return (and similarly the new CHECK_SMALL_UINT) is pretty surprising when reading the code. I wrote up a patch the other day that replaces it with an explicit return; I'll take this as a prompt to go make an issue for it and send a PR.

That's certainly not a reason not to merge this, though. I'll happily rebase my changes on top of this once it's in.

Objects/longobject.c

gnprice · 2019-08-10T21:33:06Z

BTW, I think the behavior of CHECK_SMALL_INT with its implicit return (and similarly the new CHECK_SMALL_UINT) is pretty surprising when reading the code. I wrote up a patch the other day that replaces it with an explicit return; I'll take this as a prompt to go make an issue for it and send a PR.

That's certainly not a reason not to merge this, though. I'll happily rebase my changes on top of this once it's in.

(Posted; see https://bugs.python.org/issue37812 and #15203 .)

gnprice

This looks good to me... except it's not ambitious enough! 🙂 Details below.

(Happily the more ambitious version is only a line or two more.)

Objects/longobject.c

sir-sigurd · 2019-08-25T01:09:45Z

@mdickinson anything should be done to get this merged?

sir-sigurd · 2019-08-25T09:56:49Z

I decided to deduplicate PyLong_FromUnsigned* functions in this PR, to apply this optimization to all of them at once.

gnprice

Neat! This simplifies the code and it's faster (based on the measurements you posted in the issue thread) -- that's a nice combination. 😄

I'd be glad to see the signed versions get a similar treatment. (As a separate PR.) Might not deliver a performance win at the same level, since your hypothesis is that the speedup here is from cutting out the sign logic...

but it should be performance-neutral at worst, and it's plausible it could be an improvement by generating less code so it's friendlier to the cache. Also could speed things up just as a natural consequence of deduplicating the code, by making a single place to apply all the optimizations we have in one function or another, plus any new ones you think of while staring at it.

(And even if it does turn out only performance-neutral: I think opinions will vary, but for code deduplication at this scale I'd certainly be in favor.)

gnprice · 2019-08-28T05:33:56Z

Objects/longobject.c

@@ -410,6 +414,30 @@ PyLong_FromUnsignedLong(unsigned long ival)
    return (PyObject *)v;
 }

+/* Same as above, but for unsigned long. */


Nit: better to avoid "Same as above". Partly because the "above" is already a few dozen lines away and so potentially a bit unclear... and more so because this kind of reference from a distance is highly vulnerable to getting broken by future changes to the code 🙂 .

If the other thing's description were long, a good solution would be to refer to it by name: "Like long_from_uint, but ...".

In this case the other description is only a line long in itself, so it's simplest just to copy the common part: "Create a new int object from a C unsigned long".

(Maybe even skip the word "C"? I think that really makes sense only for PyLong_FromLong, where there's a Python type that (used to, and in some ways still) has the same name as the C type.)

Might make sense to simply remove these comments, because it's quite obvious that they do the same thing as long_from_uint().

That's true, but then you have to go look up what long_from_uint does. :-)

These functions are part of the API exposed outside this module, so in general they should have a doc-comment to explain what their intended meaning is. In this case the meaning is pretty simple, and almost self-explanatory from the name... but that just means that a very short doc-comment is enough. Which leaves little to be gained from skipping it.

Non-existing comments never lie and don't require updates 🙂 .

@sir-sigurd:

Non-existing comments never lie and don't require updates 🙂.

Between the choice of keeping the comment as is or removing it, I would support removing it more. But, I think it would be useful to readers to include @gnprice's suggestion (removing the "C" part as well):

Suggested change

/* Same as above, but for unsigned long. */

/* Create a new int object from an unsigned long. */

I think it's fairly common for developers to look at the implementation for a single function in isolation rather than reading the entire file at once, so it's not always optimal to rely on comments in earlier sections for context. To some degree, each function should aim to explain itself.

Objects/longobject.c

gnprice · 2019-08-29T18:58:06Z

Wise words! Yet I think for doc-comments the risk is worth it. If the comment simply describes the interface of the function, i.e. the things a caller needs to know about it... then if that ever needs an update, there will be much bigger burdens than updating the comment. Especially for functions in the public API like these. But that is why I would like the comments to describe the interface directly (like they do in master), rather than have a dependency on something more fluid like the arrangement of this code. :-)

…

On Thu, Aug 29, 2019, 11:16 Sergey Fedoseev ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In Objects/longobject.c <#15192 (comment)>: > @@ -410,6 +414,30 @@ PyLong_FromUnsignedLong(unsigned long ival) return (PyObject *)v; } +/* Same as above, but for unsigned long. */ Non-existing comments never lie and don't require updates 🙂 . — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#15192?email_source=notifications&email_token=AAAG4DJKKLGMQMQG4ULR2JTQHAG7DA5CNFSM4IKUBGHKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCDEW34I#discussion_r319206328>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAG4DMBSVF5OI5LF3HEMEDQHAG7DANCNFSM4IKUBGHA> .

sir-sigurd · 2019-09-07T08:15:46Z

The current approach using inline function have the same problems as here (https://bugs.python.org/issue38015). I'll rework it when #15718 will be merged.

gnprice · 2019-09-09T05:22:48Z

The current approach using inline function have the same problems as here (https://bugs.python.org/issue38015). I'll rework it when #15718 will be merged.

Hmm, you measured this as being a significant speedup in your microbenchmark, right? That hardly seems like a problem 😉 even if there's a way to make the microbenchmark go even faster still.

I like this PR as it is 🙂 and I'd be glad to see it merged. It actually simplifies the code at the same time as optimizing it, so there's no need to weigh one value against another.

As I commented over on #15718, turning get_small_int into a macro makes the code inside it quite a bit more complex to read. For this one... is the idea that you would write long_from_uint as a macro? That's a 20-line function, so that sounds probably worse :-/. I think that'd be much too big a cost to pay in the code for the payoff of this microbenchmark speedup.

If that is your plan, then it's also basically additive upon the change in this PR. So I think it'd still be good to merge this PR, even if a subsequent PR goes on to turn the function into a macro.

sir-sigurd · 2019-09-09T05:30:52Z

Hmm, you measured this as being a significant speedup in your microbenchmark, right? That hardly seems like a problem wink even if there's a way to make the microbenchmark go even faster still.

It induces the same problems as your is_small_int() function on 32-bit platforms.

gnprice · 2019-09-09T05:43:24Z

It induces the same problems as your is_small_int() function on 32-bit platforms.

Right but that "problem" is that a few extra instructions are emitted, right? And so the microbenchmark is a few percent slower than it would be.

If the microbenchmark is nevertheless faster than before, then this is still an optimization.

sir-sigurd · 2019-09-09T05:46:23Z

It induces the same problems as your is_small_int() function on 32-bit platforms.

Right but that "problem" is that a few extra instructions are emitted, right? And so the microbenchmark is a few percent slower than it would be.

If the microbenchmark is nevertheless faster than before, then this is still an optimization.

https://bugs.python.org/msg351052

I leave it to you to benchmark it on 32-bit platform.

gnprice · 2019-09-09T05:53:22Z

I leave it to you to benchmark it on 32-bit platform.

Hmm -- do you mean you haven't tried it on a 32-bit platform? Because you said that it had a problem there, I assumed that meant you'd seen something empirically.

If you have, I think it would be helpful to say what results you got.

sir-sigurd · 2019-09-09T05:59:00Z

I mean I checked how it compiles on godbolt.org and that was sufficient to me to not benchmark it.

gnprice · 2019-09-09T06:17:36Z

I mean I checked how it compiles on godbolt.org and that was sufficient to me to not benchmark it.

Cool, got it.

Well -- as I said upthread, if this PR were to end up making something as big as long_from_uint into a macro, I think that would be much more of a cost than the micro-optimization is worth.

If you can find a way to get all the desired speedups without that kind of complexity in the code, that'd be great.

As I see it, I would also be glad to see a PR version merged that did something like

a nice simplification to the code (like your current version here)
~10% speedup in microbenchmark on x86_64 (like you quoted for this version on the issue thread)
~5% slowdown in microbenchmark on x86_32 (like you found for get_small_ints as function vs. macro)

because that's good for the source code and no worse than a wash on performance.

(Note also that anyone who's working to squeeze out the last drop of performance in running their code is much more likely to be using a 64-bit platform already.)

aeros · 2019-09-09T06:37:56Z

@gnprice:

Note also that anyone who's working to squeeze out the last drop of performance in running their code is much more likely to be using a 64-bit platform already

Agreed, I don't think anyone using a 32-bit platform is going to be overly concerned about a 5% drop in performance. A more significant performance loss may be an issue, but not when it's that small. IMO, an equivalent or larger increase in the performance for 64-bit outweighs an equal or smaller loss for 32-bit (within a reasonable amount).

sir-sigurd · 2019-09-09T07:17:30Z

If you can find a way to get all the desired speedups without that kind of complexity in the code, that'd be great.

I agree that'd be great to write some generic function, but C doesn't allow to write type generic functions and using inline function here induces these unwanted type casts that makes emitted code less efficient in unpredictable way. Such inline function creates a false feeling that it works like type generic function (as it was with is_small_int()), but it doesn't.

There are platforms besides x86 and I don't want to spend time assessing how inline function degrades performance on each of these platforms, and you?

(update):

~5% slowdown in microbenchmark on x86_32 (like you found for get_small_ints as function vs. macro)

Where did you get this ~5%? Did you run benchmark?

sir-sigurd · 2019-09-09T09:22:33Z

Here's inline function vs macro demo: https://godbolt.org/z/K5tbF2.
On x86-32:

inline function version: 53 instructions
macro version: 38 instructions

vstinner · 2019-09-09T10:33:35Z

On x86-32

Do you mean x86 (32 bits) or x32 (64-bit integers, but use 32 bit pointers)?

sir-sigurd · 2019-09-09T10:36:02Z

@vstinner x86 (32 bits).

gnprice · 2019-09-10T01:26:08Z

~5% slowdown in microbenchmark on x86_32 (like you found for get_small_ints as function vs. macro)

Where did you get this ~5%? Did you run benchmark?

(That's from here:
https://bugs.python.org/issue38015#msg351255

Just as you quoted me saying: it's what you found for get_small_ints as a function vs. as a macro.)

gnprice · 2019-09-10T01:44:37Z

There are platforms besides x86 and I don't want to spend time assessing how inline function degrades performance on each of these platforms, and you?

Consider this the other way around: I don't think it's a good use of time to investigate what small unnecessary bits of work each compiler on each platform emits, or going to great lengths in our code to coax them into doing a bit better.

(Remember that there's absolutely nothing in the language that means a compiler can't emit the exact same code in the static-function case as you're seeing in the macro case -- it's purely a matter of how smart the compiler gets. See e.g. this demo: #15216 (comment) )

That can be well worth it when the payoff is large -- when there's an opportunity for a significant win on ordinary Python code. (Or as a good proxy for that: a large win on code that shows up in profiles of ordinary Python code.) But many-line macros and other code complications have a real cost, and the payoff has to be worth the cost.

Compilers will emit less-than-optimal code. That's a fact of life, no matter what we do to our code. We have to keep going anyway and write the code in a way that helps it make sense to each other as humans.

When the compiler emits code that's much slower than it could be, in a place where that significantly matters, that's one thing... but if we insisted on squeezing the last cycle out of every function, we'd never get anything else done.

sir-sigurd · 2019-09-10T05:01:37Z

~5% slowdown in microbenchmark on x86_32 (like you found for get_small_ints as function vs. macro)

Where did you get this ~5%? Did you run benchmark?

(That's from here:
https://bugs.python.org/issue38015#msg351255

Just as you quoted me saying: it's what you found for get_small_ints as a function vs. as a macro.)

This 5% of difference is the cost of 2 extra instructions on x86_64, and if you check demo, you can see that in this case there is more substantial difference (53 vs 38 instructions and some of them are in loop).

sir-sigurd · 2019-09-10T06:28:36Z

But many-line macros and other code complications have a real cost, and the payoff has to be worth the cost.

This code is trivial, making it a macro doesn't make it less trivial. It makes it less "beautiful", but this code was modified once in the last 10 years, so I doubt there are many people interested in reading it.

(Remember that there's absolutely nothing in the language that means a compiler can't emit the exact same code in the static-function case as you're seeing in the macro case -- it's purely a matter of how smart the compiler gets. See e.g. this demo: #15216 (comment) )

My PR provides speed-up by eliminating these lines:

cpython/Objects/longobject.c

Lines 390 to 391 in c59295a

    
           if (ival < PyLong_BASE) 
        
               return PyLong_FromLong(ival);

I guess the person who wrote these lines many years ago was thinking the same way as you, years have passed, but compilers are still not so smart.

Objects/longobject.c

vstinner

LGTM, but I have one last request.

vstinner · 2019-09-12T15:56:16Z

Thanks @sir-sigurd, that's a nice micro-optimization ;-)

the-knights-who-say-ni added the CLA signed label Aug 9, 2019

bedevere-bot added the awaiting review label Aug 9, 2019

mdickinson reviewed Aug 9, 2019

View reviewed changes

Objects/longobject.c Outdated Show resolved Hide resolved

aeros reviewed Aug 10, 2019

View reviewed changes

Misc/NEWS.d/next/Core and Builtins/2019-08-09-18-28-57.bpo-37802.pKxcAW.rst Outdated Show resolved Hide resolved

mdickinson approved these changes Aug 10, 2019

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting review labels Aug 10, 2019

aeros reviewed Aug 10, 2019

View reviewed changes

Objects/longobject.c Outdated Show resolved Hide resolved

sir-sigurd mentioned this pull request Aug 13, 2019

bpo-37837: Add internal _PyLong_FromUnsignedChar() function #15251

Closed

sir-sigurd force-pushed the long-from-sizet-2 branch from 0ba7b3c to 9c52079 Compare August 24, 2019 18:11

gnprice reviewed Aug 24, 2019

View reviewed changes

Objects/longobject.c Show resolved Hide resolved

Objects/longobject.c Outdated Show resolved Hide resolved

aeros added the performance Performance or resource usage label Aug 25, 2019

sir-sigurd force-pushed the long-from-sizet-2 branch from 9c52079 to b4c286f Compare August 25, 2019 08:35

gnprice mentioned this pull request Aug 28, 2019

bpo-37812: Expand confusing CHECK_SMALL_INT so return is explicit. #15216

Merged

gnprice approved these changes Aug 28, 2019

View reviewed changes

sir-sigurd force-pushed the long-from-sizet-2 branch from 0d33a8e to d0fdead Compare September 9, 2019 10:19

sir-sigurd requested a review from mdickinson September 9, 2019 11:41

vstinner reviewed Sep 10, 2019

View reviewed changes

Objects/longobject.c Outdated Show resolved Hide resolved

vstinner approved these changes Sep 10, 2019

View reviewed changes

bpo-37802: Slightly improve perfomance of PyLong_FromUnsigned*()

62d3ba8

sir-sigurd force-pushed the long-from-sizet-2 branch from d0fdead to 62d3ba8 Compare September 10, 2019 18:16

gpshead approved these changes Sep 12, 2019

View reviewed changes

gpshead merged commit c6734ee into python:master Sep 12, 2019

bedevere-bot removed the awaiting merge label Sep 12, 2019

sir-sigurd deleted the long-from-sizet-2 branch September 12, 2019 15:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpo-37802: Slightly improve perfomance of PyLong_FromSize_t() #15192

bpo-37802: Slightly improve perfomance of PyLong_FromSize_t() #15192

sir-sigurd commented Aug 9, 2019 •

edited by bedevere-bot

Loading

mdickinson left a comment

gnprice commented Aug 10, 2019

gnprice commented Aug 10, 2019

gnprice left a comment

sir-sigurd commented Aug 25, 2019

sir-sigurd commented Aug 25, 2019

gnprice left a comment

gnprice Aug 28, 2019

sir-sigurd Aug 28, 2019

gnprice Aug 29, 2019

sir-sigurd Aug 29, 2019

aeros Sep 9, 2019 •

edited

Loading

gnprice commented Aug 29, 2019 via email

sir-sigurd commented Sep 7, 2019

gnprice commented Sep 9, 2019

sir-sigurd commented Sep 9, 2019

gnprice commented Sep 9, 2019

sir-sigurd commented Sep 9, 2019 •

edited

Loading

gnprice commented Sep 9, 2019

sir-sigurd commented Sep 9, 2019

gnprice commented Sep 9, 2019

aeros commented Sep 9, 2019 •

edited

Loading

sir-sigurd commented Sep 9, 2019 •

edited

Loading

sir-sigurd commented Sep 9, 2019

vstinner commented Sep 9, 2019

sir-sigurd commented Sep 9, 2019

gnprice commented Sep 10, 2019

gnprice commented Sep 10, 2019

sir-sigurd commented Sep 10, 2019

sir-sigurd commented Sep 10, 2019 •

edited

Loading

vstinner left a comment

vstinner commented Sep 12, 2019

	/* Same as above, but for unsigned long. */
	/* Create a new int object from an unsigned long. */

bpo-37802: Slightly improve perfomance of PyLong_FromSize_t() #15192

bpo-37802: Slightly improve perfomance of PyLong_FromSize_t() #15192

Conversation

sir-sigurd commented Aug 9, 2019 • edited by bedevere-bot Loading

mdickinson left a comment

Choose a reason for hiding this comment

gnprice commented Aug 10, 2019

gnprice commented Aug 10, 2019

gnprice left a comment

Choose a reason for hiding this comment

sir-sigurd commented Aug 25, 2019

sir-sigurd commented Aug 25, 2019

gnprice left a comment

Choose a reason for hiding this comment

gnprice Aug 28, 2019

Choose a reason for hiding this comment

sir-sigurd Aug 28, 2019

Choose a reason for hiding this comment

gnprice Aug 29, 2019

Choose a reason for hiding this comment

sir-sigurd Aug 29, 2019

Choose a reason for hiding this comment

aeros Sep 9, 2019 • edited Loading

Choose a reason for hiding this comment

gnprice commented Aug 29, 2019 via email

sir-sigurd commented Sep 7, 2019

gnprice commented Sep 9, 2019

sir-sigurd commented Sep 9, 2019

gnprice commented Sep 9, 2019

sir-sigurd commented Sep 9, 2019 • edited Loading

gnprice commented Sep 9, 2019

sir-sigurd commented Sep 9, 2019

gnprice commented Sep 9, 2019

aeros commented Sep 9, 2019 • edited Loading

sir-sigurd commented Sep 9, 2019 • edited Loading

sir-sigurd commented Sep 9, 2019

vstinner commented Sep 9, 2019

sir-sigurd commented Sep 9, 2019

gnprice commented Sep 10, 2019

gnprice commented Sep 10, 2019

sir-sigurd commented Sep 10, 2019

sir-sigurd commented Sep 10, 2019 • edited Loading

vstinner left a comment

Choose a reason for hiding this comment

vstinner commented Sep 12, 2019

sir-sigurd commented Aug 9, 2019 •

edited by bedevere-bot

Loading

aeros Sep 9, 2019 •

edited

Loading

sir-sigurd commented Sep 9, 2019 •

edited

Loading

aeros commented Sep 9, 2019 •

edited

Loading

sir-sigurd commented Sep 9, 2019 •

edited

Loading

sir-sigurd commented Sep 10, 2019 •

edited

Loading