Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-120196: Reuse find_max_char for bytes objects #120497

Merged
merged 2 commits into from
Jun 17, 2024

Conversation

rhpvorderman
Copy link
Contributor

@rhpvorderman rhpvorderman commented Jun 14, 2024

Simple duplicate code removal and reuse of other function.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many "comparison of distinct pointer types lacks a cast" compiler warnings. Can you have a look? (Look at the PR to see the warnings.)

@rhpvorderman
Copy link
Contributor Author

Ah, shoot. This has to do with the current implementation, and I was working in a different implementation in #120196 . Is it OK to wait for that one to be merged/rejected? Otherwise there will be some annoying merge conflicts that will take extra time.

@vstinner
Copy link
Member

It's up to you. This change looks simpler to be merged than the other one which looks controversial.

@rhpvorderman
Copy link
Contributor Author

Done. It was an easy change and it makes that the compile guards above are indeed applicable because there is no mixing of signed and unsigned pointer types.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@pitrou @serhiy-storchaka: Would you mind to review this change?

@serhiy-storchaka
Copy link
Member

Currently bytes.isascii() returns as soon as it finds the first non-ASCII byte. What with the new implementation?

@rhpvorderman
Copy link
Contributor Author

rhpvorderman commented Jun 14, 2024

Same. Or technically 2 clock cycles later as it will evaluate whether the return value is larger than 127, but if it is properly inlined it should be 0 clock cycles.

EDIT: Sorry, I should have left out the mumbo jumbo. Find_max_char also exits early when it finds a non-ascii character.

@vstinner
Copy link
Member

What with the new implementation?

It's about reusing existing code, IMO it's a good thing.

@serhiy-storchaka
Copy link
Member

Could you please provide microbenchmark results for short and long, ASCII and non-ASCII (with non-ASCII bytes at the begin and at the end) byte strings?

@rhpvorderman
Copy link
Contributor Author

rhpvorderman commented Jun 14, 2024

There is no need for benchmarking. It is literally the same code. This is removing a copy paste with a function call. See below.

This is the old duplicated code

PyObject*
_Py_bytes_isascii(const char *cptr, Py_ssize_t len)
{
    const char *p = cptr;
    const char *end = p + len;

    while (p < end) {
        /* Fast path, see in STRINGLIB(utf8_decode) in stringlib/codecs.h
           for an explanation. */
        if (_Py_IS_ALIGNED(p, ALIGNOF_SIZE_T)) {
            /* Help allocation */
            const char *_p = p;
            while (_p + SIZEOF_SIZE_T <= end) {
                size_t value = *(const size_t *) _p;
                if (value & ASCII_CHAR_MASK) {
                    Py_RETURN_FALSE;
                }
                _p += SIZEOF_SIZE_T;
            }
            p = _p;
            if (_p == end)
                break;
        }
        if ((unsigned char)*p & 0x80) {
            Py_RETURN_FALSE;
        }
        p++;
    }
    Py_RETURN_TRUE;
}

This is the code that PR is now calling

Py_LOCAL_INLINE(Py_UCS4)
STRINGLIB(find_max_char)(const STRINGLIB_CHAR *begin, const STRINGLIB_CHAR *end)
{
    const unsigned char *p = (const unsigned char *) begin;
    const unsigned char *_end = (const unsigned char *)end;

    while (p < _end) {
        if (_Py_IS_ALIGNED(p, ALIGNOF_SIZE_T)) {
            /* Help register allocation */
            const unsigned char *_p = p;
            while (_p + SIZEOF_SIZE_T <= _end) {
                size_t value = *(const size_t *) _p;
                if (value & UCS1_ASCII_CHAR_MASK)
                    return 255;
                _p += SIZEOF_SIZE_T;
            }
            p = _p;
            if (p == _end)
                break;
        }
        if (*p++ & 0x80)
            return 255;
    }
    return 127;
}

EDIT: And because it is inlined, the compiler can switch the return 255 statements with Py_RETURN_TRUE statements. It should result in the same assembly.

Return True if B is empty or all characters in B are ASCII,\n\
False otherwise.");

// Optimization is copied from ascii_decode in unicodeobject.c
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than being a copy, an inline function call to virtually the same code is now used.

@rhpvorderman
Copy link
Contributor Author

@serhiy-storchaka Sorry, I didn't mean to be so abrasive. Here are the microbenchmarks:

Before:

$ python -m timeit -s 'data=b"longstringallascii" *2000' 'data.isascii()'
100000 loops, best of 5: 2.4 usec per loop
$ python -m timeit -s 'data=b"\xff" + b"longstringnotascii" *2000 ' 'data.isascii()'
20000000 loops, best of 5: 17.2 nsec per loop
$ python -m timeit -s 'data=b"longstringnotascii" *2000 + b"\xff"' 'data.isascii()'
100000 loops, best of 5: 2.41 usec per loop
$ python -m timeit -s 'data=b"shortstringallascii"' 'data.isascii()'
20000000 loops, best of 5: 20.5 nsec per loop
$ python -m timeit -s 'data=b"\xffhortstringnotascii"' 'data.isascii()'
20000000 loops, best of 5: 17.3 nsec per loop
$ python -m timeit -s 'data=b"shortstringnotascii/xff"' 'data.isascii()'
10000000 loops, best of 5: 22.4 nsec per loop

After

$ python -m timeit -s 'data=b"longstringallascii" *2000' 'data.isascii()'
100000 loops, best of 5: 2.4 usec per loop
$ python -m timeit -s 'data=b"\xff" + b"longstringnotascii" *2000 ' 'data.isascii()'
20000000 loops, best of 5: 17.1 nsec per loop
$ python -m timeit -s 'data=b"longstringnotascii" *2000 + b"\xff"' 'data.isascii()'
100000 loops, best of 5: 2.4 usec per loop
$ python -m timeit -s 'data=b"shortstringallascii"' 'data.isascii()'
20000000 loops, best of 5: 20 nsec per loop
$ python -m timeit -s 'data=b"\xffhortstringnotascii"' 'data.isascii()'
20000000 loops, best of 5: 17 nsec per loop
$ python -m timeit -s 'data=b"shortstringnotascii/xff"' 'data.isascii()'
10000000 loops, best of 5: 22.4 nsec per loop

As expected, the results are the same (within run-to-run expected error).

@vstinner
Copy link
Member

As expected, the results are the same (within run-to-run expected error).

Thanks for actually checking.

@vstinner vstinner changed the title Reuse find_max_char for bytes objects gh-120196: Reuse find_max_char for bytes objects Jun 17, 2024
@vstinner vstinner changed the title gh-120196: Reuse find_max_char for bytes objects gh-120196: Reuse find_max_char for bytes objects Jun 17, 2024
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. It is always better to check even if the answer looks obvious. We can miss some details, and the compiler can produce unexpected results.

LGTM.

@vstinner vstinner merged commit 945a89b into python:main Jun 17, 2024
37 checks passed
@vstinner
Copy link
Member

Merged, thank you. Oops, I merged the PR while @serhiy-storchaka was reviewing it :-)

@rhpvorderman
Copy link
Contributor Author

Thank you. It is always better to check even if the answer looks obvious. We can miss some details, and the compiler can produce unexpected results.

I will keep that in mind for next time. Thanks for reviewing, both of you.

@rhpvorderman rhpvorderman deleted the bytesasciiredundant branch June 17, 2024 11:17
mrahtz pushed a commit to mrahtz/cpython that referenced this pull request Jun 30, 2024
noahbkim pushed a commit to hudson-trading/cpython that referenced this pull request Jul 11, 2024
estyxx pushed a commit to estyxx/cpython that referenced this pull request Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants