Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-92301: subprocess: Prefer close_range() to procfs-based fd closing #92303

Merged
merged 2 commits into from
May 5, 2022

Conversation

izbyshev
Copy link
Contributor

@izbyshev izbyshev commented May 4, 2022

close_range() is much faster for large number of file descriptors, e.g.
4 times faster for 1000 descriptors in a Linux 5.16-based environment.

…losing

close_range() is much faster for large number of file descriptors, e.g.
4 times faster for 1000 descriptors in a Linux 5.16-based environment.
@hugovk hugovk added the performance Performance or resource usage label May 4, 2022
@izbyshev
Copy link
Contributor Author

izbyshev commented May 4, 2022

I'd like to explain why I directly use close_range() (and only it) as the first option. Initially I planned to split _Py_closerange() into something like _Py_closerange_fast_async_safe() (to be used as the first option) and _Py_closerange_fallback() (to be used if procfs-based approach failed). But it turned out that there is nothing to use in _Py_closerange_fast_async_safe() except close_range(). While closefrom() seems to be async-signal-safe on FreeBSD, it's useless for subprocess because we always need to skip closing of the error pipe and hence need at least two ranges, which it can't do.

(There is also an even worse, though unlikely situation. _Py_closerange() currently decides to use closefrom() based on an unreliable fd limit, so if our error pipe gets assigned a high fd number, and then another thread lowers the fd limit below it, _Py_closerange(3, pipe_fd - 1) could wrongly assume it's OK to call closefrom(3)).

So instead of introducing a fancy wrapper for a single function (with theoretical future extensibility) I decided to simply use it directly, at the expense of an extra (failing) close_range() call if we ever hit _Py_closerange() fallback. This also confines all async-signal-safely concerns inside _posixsubprocess.

Also note that #84603 added the possibility of fdwalk() being called from _posixsubprocess, which was previously impossible. It seems to be a Solaris-only thing, and I don't know how safe it's to call it between fork()/exec(). But my PR doesn't revert that.

@izbyshev izbyshev marked this pull request as ready for review May 4, 2022 12:31
@izbyshev izbyshev requested a review from gpshead as a code owner May 4, 2022 12:31
@izbyshev
Copy link
Contributor Author

izbyshev commented May 4, 2022

For comparison, here is how an alternative version with split _Py_closerange() looks like: izbyshev@bdc78b4.

Copy link
Member

@gpshead gpshead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this PR is good, the open question is around confirming the specific glibc behavior when the kernel doesn't support the system call. I doubt glibc implements its own (or ever would if it does not), its role tends to just be a shim in front of the kernel. It is just worth confirming.

static int
_close_range_closer(int first, int last)
{
return close_range(first, last, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does glibc 2.34 close_range() actually do when the running kernel does not support the close_range system call? does it return -1 with errno=ENOSYS [desirable] or does it implement its own potentially poor unsafe version of close_range? The Linux/glibc documentation fails to be explicit about this. Add a comment describing the findings to the code here for future reference. If it implements its own instead of just surfacing the syscall failure, we'd probably be better off making the syscall directly as we don't want to depend on such a glibc provided fallback when we've got our own tried and true tested efficient one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does glibc 2.34 close_range() actually do when the running kernel does not support the close_range system call? does it return -1 with errno=ENOSYS [desirable] or does it implement its own potentially poor unsafe version of close_range?

I'd checked that before opening the PR. The glibc manual (which is not the same as the kernel man pages project you linked to) explicitly states that close_range() doesn't provide any fallbacks. This is in contrast to closefrom(), which not only provides a procfs-based fallback, but aborts if that fallback fails, and hence would be unsuitable not only for subprocess, but for os.closerange() as well.

Glibc, however, does have a surprise: it contains a close_range() fallback for GNU Hurd, which doesn't look async-signal-safe at all, and also a generic fallback iterating over all fds up to the fd limit (for mostly theoretical non-Linux and non-Hurd builds).

Other Linux C libraries that I've checked (musl, bionic, uclibc-ng) don't currently provide close_range() wrapper, but I think they will follow glibc if/when they provide it.

Note that close_range() is also present on FreeBSD, where it's a system call too.

Overall, I think it makes sense to enable the new code for known-good platforms only. I've added extra ifdefs.

An alternative to platform whitelisting is to use a direct syscall (and assume that whenever we have SYS_close_range, the interface is what we expect).

@gpshead
Copy link
Member

gpshead commented May 4, 2022

I agree with just calling close_range ourselves directly being the right approach as you've done here. Expanding to add more internal _Py_closerange APIs for other scenarios doesn't appear to make sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants