Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cut tricky goto that isn't needed, in _PyBytes_DecodeEscape. #15825

Merged
merged 1 commit into from
Sep 10, 2019

Conversation

gnprice
Copy link
Contributor

@gnprice gnprice commented Sep 10, 2019

This is the sort of goto that requires the reader to stare hard at
the code to unpick what it's doing.

On doing so, the answer is... not very much!

  • It jumps from the bottom of the loop to almost the top; the effect
    is to bypass the loop condition s < end and also the
    if-condition *s != '\\', acting as if both are true.

  • We've just decremented s, after incrementing it in the switch
    condition. So it has the same value as when s == end failed.
    Before that was another increment... and before that we had
    s < end. So s < end true, then increment, then s == end
    false... that means s < end is still true.

  • Also this means s points to the same character as it did for the
    switch condition. And there was a case '\\', which we didn't
    hit -- so *s != '\\' is also true.

  • That means this has no effect on the behavior! The most it might do
    is an optimization -- we get to skip those two checks, because (as
    just proven above) we know they're true.

  • But gosh, this is the invalid escape sequence path. This does not
    seem like the kind of code path that calls for extreme optimization
    tricks.

So, take the goto and the label out.

Perhaps the compiler will notice the exact same facts we showed above,
and generate identical code. Or perhaps it won't! That'll be OK.

But then, crucially, if some future edit to this loop causes the
reasoning above to stop holding true... the compiler will adjust
this jump accordingly. One of us fallible humans might not.

This is the sort of `goto` that requires the reader to stare hard at
the code to unpick what it's doing.

On doing so, the answer is... not very much!

* It jumps from the bottom of the loop to almost the top; the effect
  is to bypass the loop condition `s < end` and also the
  `if`-condition `*s != '\\'`, acting as if both are true.

* We've just decremented `s`, after incrementing it in the `switch`
  condition.  So it has the same value as when `s == end` failed.
  Before that was another increment... and before that we had
  `s < end`.  So `s < end` true, then increment, then `s == end`
  false... that means `s < end` is still true.

* Also this means `s` points to the same character as it did for the
  `switch` condition.  And there was a `case '\\'`, which we didn't
  hit -- so `*s != '\\'` is also true.

* That means this has no effect on the behavior!  The most it might do
  is an optimization -- we get to skip those two checks, because (as
  just proven above) we know they're true.

* But gosh, this is the *invalid escape sequence* path.  This does not
  seem like the kind of code path that calls for extreme optimization
  tricks.

So, take the `goto` and the label out.

Perhaps the compiler will notice the exact same facts we showed above,
and generate identical code.  Or perhaps it won't!  That'll be OK.

But then, crucially, if some future edit to this loop causes the
reasoning above to *stop* holding true... the compiler will adjust
this jump accordingly.  One of us fallible humans might not.
@aeros
Copy link
Contributor

aeros commented Sep 10, 2019

Out of curiosity, how did you manage to find this? I can follow the logic provided, but I definitely would not have noticed this myself.

@asvetlov
Copy link
Contributor

We usually reject pull requests that improve code style/readability without fixing actual bug or improving something.
I have a feeling that the PR falls into this category.

@benjaminp
Copy link
Contributor

Yeah, gotos between loops is always tricky. Glad this one could simply be dropped.

(My understanding is that there is a sequel that does more.)

@benjaminp benjaminp merged commit 0711642 into python:master Sep 10, 2019
@gnprice gnprice deleted the pr-decodeescape-goto branch September 12, 2019 04:33
@gnprice
Copy link
Contributor Author

gnprice commented Sep 12, 2019

Thanks @benjaminp for the merge!

(My understanding is that there is a sequel that does more.)

Indeed -- just sent #16013 , which cuts a bunch of tricky code here that hasn't been used since early Python 3 development in 2007. @asvetlov , I hope you'll agree in the case of that PR that it improves something, even if the something is in reading the code and not running it.

(I would say the same is true of this one, but the improvement is certainly much smaller than #16013 .)

@gnprice
Copy link
Contributor Author

gnprice commented Sep 12, 2019

Out of curiosity, how did you manage to find this? I can follow the logic provided, but I definitely would not have noticed this myself.

Good question!

The general answer is that I was reading a lot of code in the codebase. I came across this function and was curious what it did, so I read through it and looked at its callers. This goto label, there in the middle of a loop, stuck out at me.

Very concretely, IIRC I came across this function from exploring where Py_ISXDIGIT was used. (Partly just to find out what it meant! The name isn't as transparent as its neighbors'.) That in turn was part of studying the use of all the similar character-type macros in Include/pyctype.h, which was part of exploring how we handle character properties more generally; which I'd started doing as I worked on a couple of bugs I'd run into in that area, and continued doing after fixing those initial bugs.

websurfer5 pushed a commit to websurfer5/cpython that referenced this pull request Jul 20, 2020
…nGH-15825)

This is the sort of `goto` that requires the reader to stare hard at
the code to unpick what it's doing.

On doing so, the answer is... not very much!

* It jumps from the bottom of the loop to almost the top; the effect
  is to bypass the loop condition `s < end` and also the
  `if`-condition `*s != '\\'`, acting as if both are true.

* We've just decremented `s`, after incrementing it in the `switch`
  condition.  So it has the same value as when `s == end` failed.
  Before that was another increment... and before that we had
  `s < end`.  So `s < end` true, then increment, then `s == end`
  false... that means `s < end` is still true.

* Also this means `s` points to the same character as it did for the
  `switch` condition.  And there was a `case '\\'`, which we didn't
  hit -- so `*s != '\\'` is also true.

* That means this has no effect on the behavior!  The most it might do
  is an optimization -- we get to skip those two checks, because (as
  just proven above) we know they're true.

* But gosh, this is the *invalid escape sequence* path.  This does not
  seem like the kind of code path that calls for extreme optimization
  tricks.

So, take the `goto` and the label out.

Perhaps the compiler will notice the exact same facts we showed above,
and generate identical code.  Or perhaps it won't!  That'll be OK.

But then, crucially, if some future edit to this loop causes the
reasoning above to *stop* holding true... the compiler will adjust
this jump accordingly.  One of us fallible humans might not.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip issue skip news type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants