bpo-32492: 1.6x speed up in namedtuple attribute access using C fast-path #10495

pablogsal · 2018-11-13T00:59:53Z

Timing benchmarks

Attribute Access

import perf

runner = perf.Runner()
runner.timeit("a.x",
              stmt="a.x",
              setup="import collections;A=collections.namedtuple('A','x')")

./python -m perf compare_to old.json new.json -v
Mean +- std dev: [old] 280 ns +- 3 ns -> [new] 111 ns +- 1 ns: 2.52x faster

Apparently, there is a regression in the current master. This is the comparison against 3.7:

Mean +- std dev: [old] 176 ns +- 2 ns -> [new] 110 ns +- 2 ns: 1.61x faster (-38%)
Significant (t=177.69)

Creation

(Just to check that creation is not slower)

import perf

runner = perf.Runner()
runner.timeit("collections.namedtuple('A','x')",
              stmt="collections.namedtuple('A','x')",
              setup="import collections")

Mean +- std dev: [old_creation] 209 us +- 3 us -> [new_creation] 207 us +- 4 us: 1.01x faster

import perf

runner = perf.Runner()
runner.timeit("A(2324)",
              stmt="A(2324)",
              setup="import collections;A=collections.namedtuple('A','x')")

Mean +- std dev: [old_creation_obj] 1.41 us +- 0.03 us -> [new_creation_obj] 1.41 us +- 0.02 us: 1.00x faster (-0%)

Cache efficiency

Baseline

❯ perf stat -r 200 -B -e cache-references,cache-misses,cycles,instructions,branches,faults,migrations ./python -c "
import collections
A = collections.namedtuple('A','x');a = A(42)
for _ in range(100):
    some_var = a.x
"""


 Performance counter stats for './python -c
import collections
A = collections.namedtuple('A','x');a = A(42)
for _ in range(100):
    some_var = a.x
' (200 runs):

         1,469,290      cache-references:u                                            ( +-  0.26% )
            20,240      cache-misses:u            #    1.378 % of all cache refs      ( +-  8.58% )
       146,812,273      cycles:u                                                      ( +-  0.24% )
       201,131,089      instructions:u            #    1.37  insn per cycle           ( +-  0.01% )
        40,257,360      branches:u                                                    ( +-  0.01% )
             1,175      faults:u                                                      ( +-  0.01% )
                 0      migrations:u

          0.050526 +- 0.000281 seconds time elapsed  ( +-  0.56% )

Patched

❯ perf stat -r 200 -B -e cache-references,cache-misses,cycles,instructions,branches,faults,migrations ./python -c "
import collections
A = collections.namedtuple('A','x');a = A(42)
for _ in range(100):
    some_var = a.x
"""                                

 Performance counter stats for './python -c
import collections
A = collections.namedtuple('A','x');a = A(42)
for _ in range(100):
    some_var = a.x
' (200 runs):

         1,471,736      cache-references:u                                            ( +-  0.11% )
             7,196      cache-misses:u            #    0.489 % of all cache refs      ( +-  6.94% )
       145,004,120      cycles:u                                                      ( +-  0.07% )
       201,182,075      instructions:u            #    1.39  insn per cycle           ( +-  0.01% )
        40,219,107      branches:u                                                    ( +-  0.01% )
             1,174      faults:u                                                      ( +-  0.01% )
                 0      migrations:u

          0.048499 +- 0.000222 seconds time elapsed  ( +-  0.44% )

https://bugs.python.org/issue32492

…path

…C fast path

pablogsal · 2018-11-13T02:00:26Z

Modules/_collectionsmodule.c

+        Py_INCREF(self);
+        return self;
+    }
+    result = PyTuple_GetItem(obj, ((_tuplegetterobject*)self)->index);


We could add a PyTuple_Check here, but as this class is private to the collections module,
we can assume that the contract is that it has to be used in a tuple-like class. Notice that this stills throws SystemError: Objects/tupleobject.c:152: bad argument to internal function if is used incorrectly.

Being said that, I am happy to add the check if people think otherwise.

If this can be triggered from Python, we should avoid SystemError and crashes.

Made a PyTuple_Check that throws a TypeError in 21be735.

serhiy-storchaka · 2018-11-13T06:54:04Z

Include/descrobject.h

+    PyObject *prop_del;
+    PyObject *prop_doc;
+    int getter_doc;
+} propertyobject;


This should not be in the limited API. And why it is added here?

This is because the new descriptor needs to inherit from PyProperty_Type and the object type (propertyobject) was not exposed in the headerfile, making impossible to inherit from it. I placed the object definition (propertyobject) together with the Type definition (PyProperty_Type).

Where is the best place to place the definition?

serhiy-storchaka · 2018-11-13T07:03:07Z

Lib/collections/__init__.py

-            cache[index] = itemgetter_object, doc
-        class_namespace[name] = property(itemgetter_object, doc=doc)
+            tuplegetter_object = _tuplegetter(index, doc=doc)
+            cache[index] = tuplegetter_object, doc


__doc__ is a writable attribute, and descriptors in different namedtuples can have different docs. This is why itemgetter objects was cached instead property objects.

A = namedtuple('A', 'x y') B = namedtuple('B', 'x y') A.x.__doc__ = 'foo' B.x.__doc__ = 'bar' assert A.x.__doc__ = 'foo' assert B.x.__doc__ = 'bar'

I am caching only the docstring in 21be735

Please add a test based on the above example.

serhiy-storchaka · 2018-11-13T07:06:28Z

Modules/_collectionsmodule.c

+        Py_INCREF(self);
+        return self;
+    }
+    result = PyTuple_GetItem(obj, ((_tuplegetterobject*)self)->index);


If this can be triggered from Python, we should avoid SystemError and crashes.

serhiy-storchaka · 2018-11-13T07:13:35Z

Misc/NEWS.d/next/Core and Builtins/2018-11-13-01-03-10.bpo-32492.voIdcp.rst

@@ -0,0 +1,2 @@
+Speed up :class:`namedtuple` attribute access by 2.5x using a C fast-path


Make comparison not with the current master, but with 3.7. There is a regression in master.

Mean +- std dev: [old] 176 ns +- 2 ns -> [new] 110 ns +- 2 ns: 1.61x faster (-38%) Significant (t=177.69)

Therefore the number 1.6x should be used in the documentation.

vstinner · 2018-11-13T10:42:50Z

Sorry, I didn't check your implementation, but did you consider to reuse existing structseq type to implement namedtuple? https://bugs.python.org/issue28638#msg298499 Last time I ran a microbenchmark, structseq was 1.9x faster than namedtuple to get an attribute by name.

In the meanwhile, I removed property_descr_get() micro-optimization because it wasn't correct and caused 3 different crashed, bpo-30156, commit e972c13. So I get that structseq is now even faster than namedtuple to get an attribute :-)

…the descriptor itself

pablogsal · 2018-11-13T11:04:13Z

Sorry, I didn't check your implementation, but did you consider to reuse existing structseq type to implement namedtuple? https://bugs.python.org/issue28638#msg298499 Last time I ran a microbenchmark, structseq was 1.9x faster than namedtuple to get an attribute by name.

Hummm I did not consider this, but that will involve more significant and fundamental changes than this Pull Request. Also, apparently there is this issue that Josh Rosenberg ran into when implementing the idea. I am happy to give it a go if people agree that is a good idea :) But I think this we can start with this Pull Request as is simpler and it gives some immediate speedup.

serhiy-storchaka · 2018-11-14T22:28:23Z

You do not need a subclass of property. You need just a descriptor.

Look also at __slots__ implementation.

pablogsal · 2018-11-14T23:42:25Z

@serhiy-storchaka Thanks! I will take a look into that. Independently, if we don't move the property object to the header file, is not possible to subclass property in C. What do you think we should do with that?

serhiy-storchaka · 2018-11-15T08:46:54Z

Misc/NEWS.d/next/Core and Builtins/2018-11-13-01-03-10.bpo-32492.voIdcp.rst

@@ -0,0 +1,2 @@
+Speed up :class:`namedtuple` attribute access by 2.5x using a C fast-path


Therefore the number 1.6x should be used in the documentation.

serhiy-storchaka · 2018-11-15T08:51:06Z

Lib/collections/__init__.py

@@ -454,12 +459,13 @@ def __getnewargs__(self):
    cache = _nt_itemgetters
    for index, name in enumerate(field_names):
        try:
-            itemgetter_object, doc = cache[index]
+            doc = cache[index]


Please measure the effect of caching docstrings. Adding this cache for docstrings sped up namedtuple type creation by 10% in former implementation, but removing the cache for itemgetters should reduce the benefit. If it is too small, it may be not worth to use the cache at all.

This are the results for:

from collections import namedtuple; names = ['field%d' % i for i in range(1000)]" -- "namedtuple('A', names)"

❯ ./python -m perf compare_to with_caching.json without_caching.json Mean +- std dev: [with_caching] 7.88 ms +- 0.12 ms -> [without_caching] 8.24 ms +- 0.04 ms: 1.05x slower (+5%)

Is 5% slower without the cache.

serhiy-storchaka · 2018-11-15T08:57:08Z

Modules/_collectionsmodule.c

+    PyObject_HEAD
+    Py_ssize_t index;
+    PyObject* doc;
+} _tuplegetterobject;


Is it possible to reuse PyMemberDescrObject or PyGetSetDescrObject here, without creating a new type?

Hummm.... I don't see an obvious way to do that. We still need a custom descriptor protocol to access the namedtuple object and a mutable doc field.

Modules/_collectionsmodule.c

serhiy-storchaka · 2018-11-15T09:14:25Z

Modules/_collectionsmodule.c

+        return self;
+    }
+    if (!PyTuple_Check(obj)){
+        PyErr_SetString(PyExc_TypeError, "_tuplegetter must be used with tuples");


It is better to avoid using private class names in error messages.

Common error message looks like:

>>> int.numerator.__get__([]) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: descriptor 'numerator' for 'int' objects doesn't apply to 'list' object

It needs additional information: the name of the namedtuple type and the name of the filed. If it is hard to add this information, the error message can use more general words.

I have changed it to:

Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: descriptor for index '1' for tuple subclasses doesn't apply to 'A' object

serhiy-storchaka · 2018-11-15T09:24:21Z

Lib/collections/__init__.py

-            cache[index] = itemgetter_object, doc
-        class_namespace[name] = property(itemgetter_object, doc=doc)
+            tuplegetter_object = _tuplegetter(index, doc=doc)
+            cache[index] = tuplegetter_object, doc


Please add a test based on the above example.

methane · 2018-11-15T10:02:28Z

Lib/collections/__init__.py

+            cache[index] = doc
+
+        tuplegetter_object = _tuplegetter(index, doc=doc)
+        class_namespace[name] = tuplegetter_object


Can tuplegetter object be cached?

try: tuplegetter_object = cache[index] except KeyError: tuplegetter_object = cache[index] = _tuplegetter(index, doc=f'Alias for field number {index}')

Sadly, no because the docstrings are mutable. Check Serhiy comment:

#10495 (comment)

…only __new__

serhiy-storchaka · 2018-11-26T21:28:48Z

Try to make constructor arguments positional-only and repeat benchmarks for creating a namedtuple type. I think this can save several percents of creation time.

pablogsal · 2018-11-27T00:41:19Z

@serhiy-storchaka Here are the results (commit e5bca1d):

import perf

runner = perf.Runner()
runner.timeit("collections.namedtuple('A','x')", 
stmt="collections.namedtuple('A','x')", 
setup="import collections")

❯ ./python -m perf compare_to  ../cpython_baseline/old_creation.json new_creation.json
Mean +- std dev: [old_creation] 107 us +- 3 us -> [new_creation] 103 us +- 2 us: 1.03x faster (-4%)

rhettinger · 2018-12-23T08:07:41Z

This patch looks great. Thanks for the effort to get this done :-)

Before this gets committed, please make a couple of improvements.

1. The _tuplegetter() API needs to more fully emulate property():

>>> set(dir(property)) - set(dir(_tuplegetter))
{'__delete__', 'fdel', 'deleter', '__isabstractmethod__', 'setter', '__set__', 'getter', 'fget', 'fset'}

Part of the reason is that we want tuplegetter() to be a drop in substitute, supporting whatever interactions users have had with it before now (this is an old API). Another reason is that tuplegetter() needs to be recognized as a data descriptor so that its docstrings show-up in the output of help().

Formerly, running >>> help(namedtuple('Point', ['x', 'y'])(10, 20)) would produce:

 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  x
 |      Alias for field number 0
 |  
 |  y
 |      Alias for field number 1

Now we get:

 |  Methods defined here:
 |
 |  x = <_collections._tuplegetter object>
 |  y = <_collections._tuplegetter object>

2. The code in tuplegetterdescr_get can be made tighter by using PyTuple_GET_SIZE() and PyTuple_GET_ITEM() instead of PyTuple_GetItem(). That saves the function call overhead and a redundant duplicate PyTuple_Check (the second check is 100% branch predictable which is good, but still incurs two chained memory accesses).

In running timings, we should not only benchmark 1.6x to 2.5 improvement, but also compare against regular attribute access to an instance of a class that defines __slots__. Ideally tuplegetter() should be almost as fast as member objects since both do almost exactly the same work (indexing into a tuple should be only slightly slower than into slots).

serhiy-storchaka · 2018-12-23T09:16:56Z

To be recognized as a data descriptor tuplegetter needs to implement __set__.

I do not think that _tuplegetter should fully emulate property. It is enough if implement the common API of properties and data descriptors.

>>> sorted(set(dir(int.numerator)) - set(dir(_tuplegetter)))
['__delete__', '__name__', '__objclass__', '__qualname__', '__set__']
>>> class A: __slots__ = 'x'
... 
>>> sorted(set(dir(A.x)) - set(dir(_tuplegetter)))
['__delete__', '__name__', '__objclass__', '__qualname__', '__set__']

pablogsal · 2018-12-28T04:33:12Z

@rhettinger @serhiy-storchaka I am not sure what is the best path to follow. To make this PR simpler, what do you think about just reverting commit 1e14509 so tuplegetter inherits from property setting all the attributes. This "emulates" property but I cannot see any obvious downside and it makes the implementation much cleaner and (IMHO) maintainable. Manually implementing all the property methods here seems to me like raising a lot the maintenance burden, without mentioning future divergence with property.

@serhiy-storchaka Although tuplegetter would be fine just implementing the common API of data descriptors, people may be using the old properties in the namedtuple directly, accessing some particular fields that only properties have, so not implementing them may be a regression, right?

…on of tuplegetterdescr_get

serhiy-storchaka · 2018-12-28T06:46:42Z

I do not see sense in full emulating a property, and in any case your past versions did not do this.

Attributes setter and deleter are used only for defining the setter and the deleter in the class definition. _tuplegetter will not be used in such way.

__isabstractmethod__ does not make sense since _tuplegetter is not abstract.

fget, fset and fdel were not provided. In any case the user should not depend on such implementation detail. For getting a getter for the specific attribute they should use operator.attrgetter or trivial lambda.

We are at the pre-alpha stage. If some code will be broken by this change, we have enough time to fix it.

pablogsal · 2018-12-28T07:15:46Z

@serhiy-storchaka So you propose to implement:

['__delete__', '__name__', '__objclass__', '__qualname__', '__set__']

Is that correct?

serhiy-storchaka · 2018-12-28T07:24:26Z

Try to implement just __set__ and __delete__. If this is not enough for pydoc, implement more.

pablogsal · 2018-12-29T04:16:35Z

After db3ffcd:

>>> from collections import namedtuple
>>> help(namedtuple('Point', ['x', 'y'])(10, 20))

|  Data descriptors defined here:
|
|  x
|      Alias for field number 0
|
|  y
|      Alias for field number 1
|

>>> set(dir(property)) - set(dir(_tuplegetter))
{'fget', 'deleter', '__isabstractmethod__', 'getter', 'setter', 'fdel', 'fset'}

pablogsal · 2018-12-29T04:26:48Z

Benchmark agains a class definning __slots__ and tuples:

import perf

runner = perf.Runner()

runner.timeit("namedtuple",
        stmt="a.x",
        setup="""\
import collections
a = collections.namedtuple('A', ['x'])(3)
""")

runner.timeit("slots",
        stmt="b.x",
        setup="""\
class B:
    __slots__ = ("x",)

    def __init__(self, x):
        self.x = x
b = B(3)
""")

runner.timeit("tuple",
        stmt="b[0]",
        setup="""\
b = (3,)
""")

Results (no PGO):

./python ../experiment.py
.....................
namedtuple: Mean +- std dev: 34.7 ns +- 0.6 ns
.....................
slots: Mean +- std dev: 38.3 ns +- 1.8 ns
.....................
tuple: Mean +- std dev: 34.6 ns +- 0.2 ns

It turns that the latest _tuplegetter is 8% faster than __slots__ and basically the same as the tuple.

I ran some experiments regarding the inlining of PyTuple_GetItem and even without PGO is unoticeable under -O3 optimization. The x86 for the function call diff is:

        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], edi
        ...
        cmpq❘---%rdx, %rsi
        cmpq❘---%rax, 32(%rsi)

The two cmpq are smashed by the branch predictor and the stack allocation (the push and the two movs and subsequent) are almost negligible as the stack of tuplegetterdescr_get is reused. On the other hand, under O2 and less this changes and you can notice a small jitter in the benchmarks, so I think is a good idea to inline the call to PyTuple_GetItem as @rhettinger recommended.

serhiy-storchaka · 2018-12-29T06:38:39Z

Modules/_collectionsmodule.c

+    if (value == NULL) {
+        return PyObject_DelItem(obj, index);
+    }
+    return PyObject_SetItem(obj, index, value);


Just raise an AttributeError similar to errors for other read-only attributes:

>>> 1 .numerator = 2 Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: attribute 'numerator' of 'int' objects is not writable

>>> sys.version_info.major = 1 Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: readonly attribute

>>> sched.Event(0, 0, None, (), {}).time = 1 Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: can't set attribute

ZackerySpytz · 2018-12-29T10:04:19Z

Modules/_collectionsmodule.c

+    }
+
+    result = PyTuple_GET_ITEM(obj, index);
+    Py_XINCREF(result);


I think Py_INCREF() should be used here, not Py_XINCREF().

Right, now that we inline PyTuple_GetItem, the index cannot be out of bounds after the check and therefore it will always return something. Good catch!

rhettinger · 2018-12-30T09:13:10Z

FWIW, my timings show a significant improvement (more than 2x) and that named tuple attribute access is now on-par with access to member objects created by _slots_.

Nice work.

bedevere-bot · 2018-12-30T09:24:06Z

@rhettinger: Please replace # with GH- in the commit message next time. Thanks!

bpo-32492: 2.5x speed up in namedtuple attribute access using C fast …

97c3ee5

…path

pablogsal self-assigned this Nov 13, 2018

pablogsal requested a review from rhettinger November 13, 2018 00:59

the-knights-who-say-ni added the CLA signed label Nov 13, 2018

bedevere-bot added the awaiting merge label Nov 13, 2018

pablogsal added 2 commits November 13, 2018 01:03

Add News entry

5dc78ff

fixup! bpo-32492: 2.5x speed up in namedtuple attribute access using …

631ab2c

…C fast path

pablogsal commented Nov 13, 2018

View reviewed changes

serhiy-storchaka reviewed Nov 13, 2018

View reviewed changes

pablogsal changed the title ~~bpo-32492: 2.5x speed up in namedtuple attribute access using C fast-path~~ bpo-32492: 1.6x speed up in namedtuple attribute access using C fast-path Nov 13, 2018

Check for tuple in the __get__ of the new descriptor and don't cache …

21be735

…the descriptor itself

pablogsal force-pushed the bpo32492 branch from 7d1ef82 to 21be735 Compare November 13, 2018 10:53

Don't inherit from property. Implement GC methods to handle __doc__

1e14509

serhiy-storchaka reviewed Nov 15, 2018

View reviewed changes

methane reviewed Nov 15, 2018

View reviewed changes

pablogsal added 3 commits November 16, 2018 00:15

Add a test for the docstring substitution in descriptors

f9ca1e4

Update NEWS entry to reflect time against 3.7 branch

a6187b8

Simplify implementation with argument clinic, better error messages, …

7d2dd84

…only __new__

rhettinger self-assigned this Nov 26, 2018

Use positional-only parameters for the __new__

e5bca1d

Use PyTuple_GET_SIZE and PyTuple_GET_ITEM to tighter the implementati…

96eae4c

…on of tuplegetterdescr_get

serhiy-storchaka reviewed Dec 29, 2018

View reviewed changes

ZackerySpytz reviewed Dec 29, 2018

View reviewed changes

Implement __set__ to make tuplegetter a data descriptor

9838c39

pablogsal force-pushed the bpo32492 branch from db3ffcd to 9838c39 Compare December 29, 2018 19:28

pablogsal and others added 3 commits December 29, 2018 19:30

Use Py_INCREF now that we inline PyTuple_GetItem

2e350ed

Apply the valid_index() function, saving one test

c9772e8

Move Py_None test out of the critical path.

62cd7fd

rhettinger merged commit 3f5fc70 into python:master Dec 30, 2018

bedevere-bot removed the awaiting merge label Dec 30, 2018

pablogsal deleted the bpo32492 branch December 30, 2018 09:34

mtreinish mentioned this pull request Sep 15, 2021

[WIP] Encapsulate instruction in args in "Instruction" class Qiskit/qiskit#7020

Closed

3 tasks

		@@ -0,0 +1,2 @@
		Speed up :class:`namedtuple` attribute access by 2.5x using a C fast-path

bpo-32492: 1.6x speed up in namedtuple attribute access using C fast-path #10495

bpo-32492: 1.6x speed up in namedtuple attribute access using C fast-path #10495

Conversation

pablogsal commented Nov 13, 2018 • edited Loading

Timing benchmarks

Attribute Access

Creation

Cache efficiency

Baseline

Patched

pablogsal Nov 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vstinner commented Nov 13, 2018 • edited by bedevere-bot Loading

pablogsal commented Nov 13, 2018 • edited Loading

serhiy-storchaka commented Nov 14, 2018

pablogsal commented Nov 14, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pablogsal Nov 16, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

serhiy-storchaka commented Nov 26, 2018

pablogsal commented Nov 27, 2018 • edited Loading

rhettinger commented Dec 23, 2018

serhiy-storchaka commented Dec 23, 2018

pablogsal commented Dec 28, 2018 • edited Loading

serhiy-storchaka commented Dec 28, 2018

pablogsal commented Dec 28, 2018 • edited Loading

serhiy-storchaka commented Dec 28, 2018

pablogsal commented Dec 29, 2018

pablogsal commented Dec 29, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pablogsal Dec 29, 2018 • edited Loading

Choose a reason for hiding this comment

rhettinger commented Dec 30, 2018

bedevere-bot commented Dec 30, 2018

pablogsal commented Nov 13, 2018 •

edited

Loading

pablogsal Nov 13, 2018 •

edited

Loading

vstinner commented Nov 13, 2018 •

edited by bedevere-bot

Loading

pablogsal commented Nov 13, 2018 •

edited

Loading

pablogsal Nov 16, 2018 •

edited

Loading

pablogsal commented Nov 27, 2018 •

edited

Loading

pablogsal commented Dec 28, 2018 •

edited

Loading

pablogsal commented Dec 28, 2018 •

edited

Loading

pablogsal commented Dec 29, 2018 •

edited

Loading

pablogsal Dec 29, 2018 •

edited

Loading