Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-29735: Optimize partial_call(): avoid tuple #516

Merged
merged 1 commit into from
Mar 14, 2017
Merged

bpo-29735: Optimize partial_call(): avoid tuple #516

merged 1 commit into from
Mar 14, 2017

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Mar 6, 2017

Avoid temporary tuple to pass positional arguments.

  • 2 args: 1.14x faster (-12%)
  • 6 args: 1.15x faster (-13%)
  • 10 args: 1.17x faster (-14%)

@mention-bot
Copy link

@Haypo, thanks for your PR! By analyzing the history of the files in this pull request, we identified @serhiy-storchaka, @rhettinger, @benjaminp, @abalkin and @1st1 to be potential reviewers.

@vstinner vstinner added the performance Performance or resource usage label Mar 6, 2017
@serhiy-storchaka serhiy-storchaka self-requested a review March 6, 2017 15:06
@vstinner
Copy link
Member Author

New benchmark results: http://bugs.python.org/issue29735#msg289579

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be the optimization needs more test cases for covering all code.

Objects/call.c Outdated
}
else if (PyCFunction_Check(callable)) {
PyMethodDef *method = ((PyCFunctionObject*)callable)->m_ml;
int flags = method->ml_flags & ~(METH_CLASS | METH_STATIC | METH_COEXIST);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think clearing these flags is not needed.

It would be better to use PyCFunction_GET_FLAGS().

In sum this function can be written as:

    return PyFunction_Check(callable) ||
           (PyCFunction_Check(callable) && 
            !(PyCFunction_GET_FLAGS(callable) & METH_VARARGS));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"I think clearing these flags is not needed. It would be better to use PyCFunction_GET_FLAGS()."

Right, done. But I prefer explicit if().

@@ -110,6 +111,8 @@ partial_new(PyTypeObject *type, PyObject *args, PyObject *kw)
return NULL;
}

pto->use_fastcall = _PyObject_HasFastCall(func);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pto->fn can be changed by __setstate__().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nice catch. Fixed.

stack = &PyTuple_GET_ITEM(args, 0);
nargs = PyTuple_GET_SIZE(args);
argappl = NULL;
Py_INCREF(args);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since concatenating with empty tuple is optimized, perhaps these two special cases no longer needed. -10 lines of code. Could you tests this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested: removing the two special case (size==0) has no impact on performance, so I removed it.

nargs2 = pto_nargs;
}
else {
nargs2 = pto_nargs + nargs;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be used for all branches. -2 lines of code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
}

ret = _PyObject_FastCallDict(pto->fn, stack, nargs2, kwargs2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are keyword arguments _PyObject_FastCallDict allocates new array for arguments. I don't know whether it is worth to optimize this case and unpack keyword arguments here. This could complicate the code too much and I don't know how large is a benefit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a corner case: dict.update() really expects a dict.

Your proposed optimizations seem complex for a little gain.

nargs * sizeof(PyObject*));
}

if (PyDict_GET_SIZE(pto->kw) == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems the code for merging two dicts is duplicated. Could it be shared?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -322,6 +366,7 @@ partial_setstate(partialobject *pto, PyObject *state)
Py_INCREF(dict);

Py_SETREF(pto->fn, fn);
pto->use_fastcall = _PyObject_HasFastCall(fn);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it before Py_SETREF(). The partial object should be consistent at the time of destroying old callable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@serhiy-storchaka
Copy link
Member

Please check that new code is covered by tests. Otherwise add new tests.

* Add _PyObject_HasFastCall()
* partial_call() now avoids temporary tuple to pass positional
  arguments if the callable supports the FASTCALL calling convention
  for positional arguments.
* Fix also a performance regression in partial_call() if the callable
  doesn't support FASTCALL.
@vstinner
Copy link
Member Author

Please check that new code is covered by tests. Otherwise add new tests.

All code paths of partial_fastcall() and partial_call_impl() are tested by test_functools.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope it is worth the complication of the code.

@vstinner vstinner merged commit 0f7b0b3 into python:master Mar 14, 2017
@vstinner
Copy link
Member Author

Thank you for your review @serhiy-storchaka !

@vstinner vstinner deleted the optim_partial_call branch March 14, 2017 20:37
jaraco pushed a commit that referenced this pull request Dec 2, 2022
Bumps [gidgethub](https://github.com/brettcannon/gidgethub) from 5.0.1 to 5.1.0.
- [Release notes](https://github.com/brettcannon/gidgethub/releases)
- [Changelog](https://github.com/brettcannon/gidgethub/blob/main/docs/changelog.rst)
- [Commits](gidgethub/gidgethub@5.0.1...v5.1.0)

---
updated-dependencies:
- dependency-name: gidgethub
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants