gh-120196: Reuse find_max_char for bytes objects #120497

rhpvorderman · 2024-06-14T12:02:29Z

Simple duplicate code removal and reuse of other function.

Issue: Faster decode and find_max_char implementations. #120196

vstinner

There are many "comparison of distinct pointer types lacks a cast" compiler warnings. Can you have a look? (Look at the PR to see the warnings.)

rhpvorderman · 2024-06-14T12:11:46Z

Ah, shoot. This has to do with the current implementation, and I was working in a different implementation in #120196 . Is it OK to wait for that one to be merged/rejected? Otherwise there will be some annoying merge conflicts that will take extra time.

vstinner · 2024-06-14T12:17:18Z

It's up to you. This change looks simpler to be merged than the other one which looks controversial.

rhpvorderman · 2024-06-14T12:22:29Z

Done. It was an easy change and it makes that the compile guards above are indeed applicable because there is no mixing of signed and unsigned pointer types.

vstinner

LGTM.

@pitrou @serhiy-storchaka: Would you mind to review this change?

serhiy-storchaka · 2024-06-14T12:32:06Z

Currently bytes.isascii() returns as soon as it finds the first non-ASCII byte. What with the new implementation?

rhpvorderman · 2024-06-14T12:34:14Z

Same. Or technically 2 clock cycles later as it will evaluate whether the return value is larger than 127, but if it is properly inlined it should be 0 clock cycles.

EDIT: Sorry, I should have left out the mumbo jumbo. Find_max_char also exits early when it finds a non-ascii character.

vstinner · 2024-06-14T12:38:03Z

What with the new implementation?

It's about reusing existing code, IMO it's a good thing.

serhiy-storchaka · 2024-06-14T12:39:33Z

Could you please provide microbenchmark results for short and long, ASCII and non-ASCII (with non-ASCII bytes at the begin and at the end) byte strings?

rhpvorderman · 2024-06-14T12:50:55Z

There is no need for benchmarking. It is literally the same code. This is removing a copy paste with a function call. See below.

This is the old duplicated code

PyObject*
_Py_bytes_isascii(const char *cptr, Py_ssize_t len)
{
    const char *p = cptr;
    const char *end = p + len;

    while (p < end) {
        /* Fast path, see in STRINGLIB(utf8_decode) in stringlib/codecs.h
           for an explanation. */
        if (_Py_IS_ALIGNED(p, ALIGNOF_SIZE_T)) {
            /* Help allocation */
            const char *_p = p;
            while (_p + SIZEOF_SIZE_T <= end) {
                size_t value = *(const size_t *) _p;
                if (value & ASCII_CHAR_MASK) {
                    Py_RETURN_FALSE;
                }
                _p += SIZEOF_SIZE_T;
            }
            p = _p;
            if (_p == end)
                break;
        }
        if ((unsigned char)*p & 0x80) {
            Py_RETURN_FALSE;
        }
        p++;
    }
    Py_RETURN_TRUE;
}

This is the code that PR is now calling

Py_LOCAL_INLINE(Py_UCS4)
STRINGLIB(find_max_char)(const STRINGLIB_CHAR *begin, const STRINGLIB_CHAR *end)
{
    const unsigned char *p = (const unsigned char *) begin;
    const unsigned char *_end = (const unsigned char *)end;

    while (p < _end) {
        if (_Py_IS_ALIGNED(p, ALIGNOF_SIZE_T)) {
            /* Help register allocation */
            const unsigned char *_p = p;
            while (_p + SIZEOF_SIZE_T <= _end) {
                size_t value = *(const size_t *) _p;
                if (value & UCS1_ASCII_CHAR_MASK)
                    return 255;
                _p += SIZEOF_SIZE_T;
            }
            p = _p;
            if (p == _end)
                break;
        }
        if (*p++ & 0x80)
            return 255;
    }
    return 127;
}

EDIT: And because it is inlined, the compiler can switch the return 255 statements with Py_RETURN_TRUE statements. It should result in the same assembly.

rhpvorderman · 2024-06-14T13:24:44Z

Objects/bytes_methods.c

-Return True if B is empty or all characters in B are ASCII,\n\
-False otherwise.");
-
-// Optimization is copied from ascii_decode in unicodeobject.c


Rather than being a copy, an inline function call to virtually the same code is now used.

rhpvorderman · 2024-06-17T10:05:47Z

@serhiy-storchaka Sorry, I didn't mean to be so abrasive. Here are the microbenchmarks:

Before:

$ python -m timeit -s 'data=b"longstringallascii" *2000' 'data.isascii()'
100000 loops, best of 5: 2.4 usec per loop
$ python -m timeit -s 'data=b"\xff" + b"longstringnotascii" *2000 ' 'data.isascii()'
20000000 loops, best of 5: 17.2 nsec per loop
$ python -m timeit -s 'data=b"longstringnotascii" *2000 + b"\xff"' 'data.isascii()'
100000 loops, best of 5: 2.41 usec per loop
$ python -m timeit -s 'data=b"shortstringallascii"' 'data.isascii()'
20000000 loops, best of 5: 20.5 nsec per loop
$ python -m timeit -s 'data=b"\xffhortstringnotascii"' 'data.isascii()'
20000000 loops, best of 5: 17.3 nsec per loop
$ python -m timeit -s 'data=b"shortstringnotascii/xff"' 'data.isascii()'
10000000 loops, best of 5: 22.4 nsec per loop

After

$ python -m timeit -s 'data=b"longstringallascii" *2000' 'data.isascii()'
100000 loops, best of 5: 2.4 usec per loop
$ python -m timeit -s 'data=b"\xff" + b"longstringnotascii" *2000 ' 'data.isascii()'
20000000 loops, best of 5: 17.1 nsec per loop
$ python -m timeit -s 'data=b"longstringnotascii" *2000 + b"\xff"' 'data.isascii()'
100000 loops, best of 5: 2.4 usec per loop
$ python -m timeit -s 'data=b"shortstringallascii"' 'data.isascii()'
20000000 loops, best of 5: 20 nsec per loop
$ python -m timeit -s 'data=b"\xffhortstringnotascii"' 'data.isascii()'
20000000 loops, best of 5: 17 nsec per loop
$ python -m timeit -s 'data=b"shortstringnotascii/xff"' 'data.isascii()'
10000000 loops, best of 5: 22.4 nsec per loop

As expected, the results are the same (within run-to-run expected error).

vstinner · 2024-06-17T10:18:28Z

As expected, the results are the same (within run-to-run expected error).

Thanks for actually checking.

serhiy-storchaka

Thank you. It is always better to check even if the answer looks obvious. We can miss some details, and the compiler can produce unexpected results.

LGTM.

vstinner · 2024-06-17T10:22:24Z

Merged, thank you. Oops, I merged the PR while @serhiy-storchaka was reviewing it :-)

rhpvorderman · 2024-06-17T11:17:19Z

Thank you. It is always better to check even if the answer looks obvious. We can miss some details, and the compiler can produce unexpected results.

I will keep that in mind for next time. Thanks for reviewing, both of you.

Reuse find_max_char for bytes objects

59bbd60

bedevere-app bot added the awaiting review label Jun 14, 2024

rhpvorderman mentioned this pull request Jun 14, 2024

gh-120196: Faster ascii_decode and find_max_char implementations #120212

Closed

vstinner reviewed Jun 14, 2024

View reviewed changes

Fix compiler warnings

5852ce3

vstinner approved these changes Jun 14, 2024

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting review labels Jun 14, 2024

rhpvorderman commented Jun 14, 2024

View reviewed changes

vstinner changed the title ~~Reuse find_max_char for bytes objects~~ gh-120196: Reuse find_max_char for bytes objects Jun 17, 2024

bedevere-app bot mentioned this pull request Jun 17, 2024

Faster decode and find_max_char implementations. #120196

Closed

vstinner changed the title ~~gh-120196: Reuse find_max_char for bytes objects~~ gh-120196: Reuse find_max_char for bytes objects Jun 17, 2024

vstinner added the skip news label Jun 17, 2024

serhiy-storchaka approved these changes Jun 17, 2024

View reviewed changes

vstinner merged commit 945a89b into python:main Jun 17, 2024
37 checks passed

bedevere-app bot removed the awaiting merge label Jun 17, 2024

rhpvorderman deleted the bytesasciiredundant branch June 17, 2024 11:17

mrahtz pushed a commit to mrahtz/cpython that referenced this pull request Jun 30, 2024

pythongh-120196: Reuse find_max_char() for bytes objects (python#120497)

b93d655

noahbkim pushed a commit to hudson-trading/cpython that referenced this pull request Jul 11, 2024

pythongh-120196: Reuse find_max_char() for bytes objects (python#120497)

54f2eaa

estyxx pushed a commit to estyxx/cpython that referenced this pull request Jul 17, 2024

pythongh-120196: Reuse find_max_char() for bytes objects (python#120497)

9d3b846

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-120196: Reuse find_max_char for bytes objects #120497

gh-120196: Reuse find_max_char for bytes objects #120497

rhpvorderman commented Jun 14, 2024 •

edited by bedevere-app bot

Loading

vstinner left a comment

rhpvorderman commented Jun 14, 2024

vstinner commented Jun 14, 2024

rhpvorderman commented Jun 14, 2024

vstinner left a comment

serhiy-storchaka commented Jun 14, 2024

rhpvorderman commented Jun 14, 2024 •

edited

Loading

vstinner commented Jun 14, 2024

serhiy-storchaka commented Jun 14, 2024

rhpvorderman commented Jun 14, 2024 •

edited

Loading

rhpvorderman Jun 14, 2024

rhpvorderman commented Jun 17, 2024

vstinner commented Jun 17, 2024

serhiy-storchaka left a comment

vstinner commented Jun 17, 2024

rhpvorderman commented Jun 17, 2024

gh-120196: Reuse find_max_char for bytes objects #120497

gh-120196: Reuse find_max_char for bytes objects #120497

Conversation

rhpvorderman commented Jun 14, 2024 • edited by bedevere-app bot Loading

vstinner left a comment

Choose a reason for hiding this comment

rhpvorderman commented Jun 14, 2024

vstinner commented Jun 14, 2024

rhpvorderman commented Jun 14, 2024

vstinner left a comment

Choose a reason for hiding this comment

serhiy-storchaka commented Jun 14, 2024

rhpvorderman commented Jun 14, 2024 • edited Loading

vstinner commented Jun 14, 2024

serhiy-storchaka commented Jun 14, 2024

rhpvorderman commented Jun 14, 2024 • edited Loading

rhpvorderman Jun 14, 2024

Choose a reason for hiding this comment

rhpvorderman commented Jun 17, 2024

vstinner commented Jun 17, 2024

serhiy-storchaka left a comment

Choose a reason for hiding this comment

vstinner commented Jun 17, 2024

rhpvorderman commented Jun 17, 2024

rhpvorderman commented Jun 14, 2024 •

edited by bedevere-app bot

Loading

rhpvorderman commented Jun 14, 2024 •

edited

Loading

rhpvorderman commented Jun 14, 2024 •

edited

Loading