Cut tricky `goto` that isn't needed, in _PyBytes_DecodeEscape. #15825

gnprice · 2019-09-10T07:35:15Z

This is the sort of goto that requires the reader to stare hard at
the code to unpick what it's doing.

On doing so, the answer is... not very much!

It jumps from the bottom of the loop to almost the top; the effect
is to bypass the loop condition s < end and also the
if-condition *s != '\\', acting as if both are true.
We've just decremented s, after incrementing it in the switch
condition. So it has the same value as when s == end failed.
Before that was another increment... and before that we had
s < end. So s < end true, then increment, then s == end
false... that means s < end is still true.
Also this means s points to the same character as it did for the
switch condition. And there was a case '\\', which we didn't
hit -- so *s != '\\' is also true.
That means this has no effect on the behavior! The most it might do
is an optimization -- we get to skip those two checks, because (as
just proven above) we know they're true.
But gosh, this is the invalid escape sequence path. This does not
seem like the kind of code path that calls for extreme optimization
tricks.

So, take the goto and the label out.

Perhaps the compiler will notice the exact same facts we showed above,
and generate identical code. Or perhaps it won't! That'll be OK.

But then, crucially, if some future edit to this loop causes the
reasoning above to stop holding true... the compiler will adjust
this jump accordingly. One of us fallible humans might not.

This is the sort of `goto` that requires the reader to stare hard at the code to unpick what it's doing. On doing so, the answer is... not very much! * It jumps from the bottom of the loop to almost the top; the effect is to bypass the loop condition `s < end` and also the `if`-condition `*s != '\\'`, acting as if both are true. * We've just decremented `s`, after incrementing it in the `switch` condition. So it has the same value as when `s == end` failed. Before that was another increment... and before that we had `s < end`. So `s < end` true, then increment, then `s == end` false... that means `s < end` is still true. * Also this means `s` points to the same character as it did for the `switch` condition. And there was a `case '\\'`, which we didn't hit -- so `*s != '\\'` is also true. * That means this has no effect on the behavior! The most it might do is an optimization -- we get to skip those two checks, because (as just proven above) we know they're true. * But gosh, this is the *invalid escape sequence* path. This does not seem like the kind of code path that calls for extreme optimization tricks. So, take the `goto` and the label out. Perhaps the compiler will notice the exact same facts we showed above, and generate identical code. Or perhaps it won't! That'll be OK. But then, crucially, if some future edit to this loop causes the reasoning above to *stop* holding true... the compiler will adjust this jump accordingly. One of us fallible humans might not.

aeros · 2019-09-10T08:14:02Z

Out of curiosity, how did you manage to find this? I can follow the logic provided, but I definitely would not have noticed this myself.

asvetlov · 2019-09-10T08:48:38Z

We usually reject pull requests that improve code style/readability without fixing actual bug or improving something.
I have a feeling that the PR falls into this category.

benjaminp · 2019-09-10T08:50:49Z

Yeah, gotos between loops is always tricky. Glad this one could simply be dropped.

(My understanding is that there is a sequel that does more.)

gnprice · 2019-09-12T06:28:13Z

Thanks @benjaminp for the merge!

(My understanding is that there is a sequel that does more.)

Indeed -- just sent #16013 , which cuts a bunch of tricky code here that hasn't been used since early Python 3 development in 2007. @asvetlov , I hope you'll agree in the case of that PR that it improves something, even if the something is in reading the code and not running it.

(I would say the same is true of this one, but the improvement is certainly much smaller than #16013 .)

gnprice · 2019-09-12T06:55:06Z

Out of curiosity, how did you manage to find this? I can follow the logic provided, but I definitely would not have noticed this myself.

Good question!

The general answer is that I was reading a lot of code in the codebase. I came across this function and was curious what it did, so I read through it and looked at its callers. This goto label, there in the middle of a loop, stuck out at me.

Very concretely, IIRC I came across this function from exploring where Py_ISXDIGIT was used. (Partly just to find out what it meant! The name isn't as transparent as its neighbors'.) That in turn was part of studying the use of all the similar character-type macros in Include/pyctype.h, which was part of exploring how we handle character properties more generally; which I'd started doing as I worked on a couple of bugs I'd run into in that area, and continued doing after fixing those initial bugs.

…nGH-15825) This is the sort of `goto` that requires the reader to stare hard at the code to unpick what it's doing. On doing so, the answer is... not very much! * It jumps from the bottom of the loop to almost the top; the effect is to bypass the loop condition `s < end` and also the `if`-condition `*s != '\\'`, acting as if both are true. * We've just decremented `s`, after incrementing it in the `switch` condition. So it has the same value as when `s == end` failed. Before that was another increment... and before that we had `s < end`. So `s < end` true, then increment, then `s == end` false... that means `s < end` is still true. * Also this means `s` points to the same character as it did for the `switch` condition. And there was a `case '\\'`, which we didn't hit -- so `*s != '\\'` is also true. * That means this has no effect on the behavior! The most it might do is an optimization -- we get to skip those two checks, because (as just proven above) we know they're true. * But gosh, this is the *invalid escape sequence* path. This does not seem like the kind of code path that calls for extreme optimization tricks. So, take the `goto` and the label out. Perhaps the compiler will notice the exact same facts we showed above, and generate identical code. Or perhaps it won't! That'll be OK. But then, crucially, if some future edit to this loop causes the reasoning above to *stop* holding true... the compiler will adjust this jump accordingly. One of us fallible humans might not.

the-knights-who-say-ni added the CLA signed label Sep 10, 2019

bedevere-bot added the awaiting review label Sep 10, 2019

aeros added skip issue skip news type-feature A feature request or enhancement labels Sep 10, 2019

benjaminp merged commit 0711642 into python:master Sep 10, 2019

bedevere-bot removed the awaiting review label Sep 10, 2019

gnprice deleted the pr-decodeescape-goto branch September 12, 2019 04:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cut tricky `goto` that isn't needed, in _PyBytes_DecodeEscape. #15825

Cut tricky `goto` that isn't needed, in _PyBytes_DecodeEscape. #15825

gnprice commented Sep 10, 2019

aeros commented Sep 10, 2019

asvetlov commented Sep 10, 2019

benjaminp commented Sep 10, 2019

gnprice commented Sep 12, 2019

gnprice commented Sep 12, 2019

Cut tricky goto that isn't needed, in _PyBytes_DecodeEscape. #15825

Cut tricky goto that isn't needed, in _PyBytes_DecodeEscape. #15825

Conversation

gnprice commented Sep 10, 2019

aeros commented Sep 10, 2019

asvetlov commented Sep 10, 2019

benjaminp commented Sep 10, 2019

gnprice commented Sep 12, 2019

gnprice commented Sep 12, 2019

Cut tricky `goto` that isn't needed, in _PyBytes_DecodeEscape. #15825

Cut tricky `goto` that isn't needed, in _PyBytes_DecodeEscape. #15825