Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-90622: Do not spawn ProcessPool workers on demand via fork method. #91598

Conversation

gpshead
Copy link
Member

@gpshead gpshead commented Apr 16, 2022

This avoids potential deadlocks in the child processes due to forking from
a multithreaded process.

This is the bugfix only PR suitable for backporting (manual intervention likely still required). See #91587 for the PR fixing the new 3.11 feature that builds on this. I expect to merge that one first and rebase this on top of it.

…spawn via fork.

This avoids potential deadlocks in the child processes due to forking from
a multithreaded process.
@gpshead gpshead added type-bug An unexpected behavior, bug, or error needs backport to 3.9 only security fixes needs backport to 3.10 only security fixes labels Apr 16, 2022
@gpshead
Copy link
Member Author

gpshead commented Apr 16, 2022

I'll want to merge #91587 first as it contains some changes that will block this PR from working unless I merged the two into one. This is also blocked on #91600.

@gpshead gpshead requested review from methane and pitrou April 17, 2022 05:55
@pitrou
Copy link
Member

pitrou commented Apr 18, 2022

cc @tomMoral for information and opinions

@tomMoral
Copy link
Contributor

When looking at the original bug report, I don't see why the deadlock would be linked to bad interaction between fork and threads. When I played with loky, spawning children with fork while having python threads have never lead to deadlocks. The fork+thread deadlock were mostly caused by C-level threadpool (such as openblas ones) that were thought to be present in the child process, but the work was never run as the threads were not started in the forked process.

Maybe there is something that I am missing in the context of using tcmalloc that would lead to worst behavior in this context?

That being said, when looking at the commit raised by bisect, I would be prone to suspect that something goes wrong with the feature introduced in #19453 and not with the thread+fork pattern, as the deadlock occurs with a concurrent call to shutdown and adjust_worker/ semaphore.release. I could not reproduce the deadlock on my machine as I don't know how to build with the tcmalloc but I would suggest investigating a bit why the deadlock is happening using the faulthandler to dump the stack trace when the deadlock happens:

from concurrent import futures
import faulthandler
import sys

def work(iteration, item):
  sys.stdout.write(f'working: iteration={iteration}, item={item}\n')
  sys.stdout.flush()

for i in range(0, 10000):
    faulthandler.dump_traceback_later(20, exit=True)
    with futures.ProcessPoolExecutor(max_workers=2) as executor:
        executor.submit(work, i, 1)
        executor.submit(work, i, 2)
    faulthandler.cancel_dump_traceback_later()

This should give a traceback that could help understand if the deadlock is really due to fork+thread or if this is due to bad interaction between the semaphore and the shutdown.

@pitrou
Copy link
Member

pitrou commented Apr 22, 2022

Maybe there is something that I am missing in the context of using tcmalloc that would lead to worst behavior in this context?

tcmalloc probably uses a background thread and doesn't check that the thread disappeared. This is a common problem with C libraries.

@tomMoral
Copy link
Contributor

tcmalloc probably uses a background thread and doesn't check that the thread disappeared. This is a common problem with C libraries.

If this was such a mechanism that is causing the deadlock, this should be deterministic as the C-level threads are never propagated. As the deadlock occurs randomly, I think this should have to do with some concurrency issue.

@pitrou
Copy link
Member

pitrou commented Apr 22, 2022

Ah, then it's possible that the fork happens while a lock is held by another thread in the parent...

@gpshead
Copy link
Member Author

gpshead commented Apr 25, 2022

it's possible that the fork happens while a lock is held by another thread in the parent...

Yep. This is the fundamental reason fork() must never be used in a process with threads running that are not wholly controlled and synchronized by the code doing the fork. Which means "never fork if threads exist" in the context of a library as it isn't in control of the threads. Unfortunately there is no nice API to determine "if threads exist" in a process. One common thing that surfaces this to observers is malloc implementations that use locks as literally everything uses malloc all the time. But that is merely one frequently observed scenario. Any lock or equivalent state anywhere in the program that may have its copy accessed from the forked child process is a potential problem if it was not guaranteed to be in a safe state before the fork via appropriate before/after fork handlers. Threaded libraries can never be assumed to have been designed to do that ...and designing for that can reduce performance.

One reason some people wrongly assume "I use fork() in a threaded program and I never have problems" implies that it works is that glibc's malloc intentionally does not use locks (or if it does, it actively manages them during fork) because they aim for a level of legacy compatibility that includes attempting to support existing programs doing bad things that POSIX specs do not actually allow. glibc has a relatively poor malloc implementation for threaded process performance as a result. Thus tcmalloc, jemalloc, mimalloc, etc. all existing and being adopted by people serious about performance.

So whenever you see threads knowingly existing and fork being used, there is a problem. Regardless of if you can reproduce it yourself.

</preacher mode>

Adds a comment describing why the lack of assertion means we still have
a potential bug.
@gpshead gpshead marked this pull request as ready for review May 6, 2022 06:52
@gpshead gpshead self-assigned this May 6, 2022
@gpshead gpshead added the needs backport to 3.11 only security fixes label May 8, 2022
@gpshead gpshead merged commit ebb37fc into python:main May 8, 2022
@miss-islington
Copy link
Contributor

Thanks @gpshead for the PR 🌮🎉.. I'm working now to backport this PR to: 3.9, 3.10, 3.11.
🐍🍒⛏🤖

@gpshead gpshead deleted the disallow-dynamic-process-spawn-with-fork-bugfix-only branch May 8, 2022 16:20
@miss-islington
Copy link
Contributor

Sorry, @gpshead, I could not cleanly backport this to 3.10 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker ebb37fc3fdcb03db4e206db017eeef7aaffbae84 3.10

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request May 8, 2022
…ethod. (pythonGH-91598)

Do not spawn ProcessPool workers on demand when they spawn via fork.

This avoids potential deadlocks in the child processes due to forking from
a multithreaded process.
(cherry picked from commit ebb37fc)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
@bedevere-bot
Copy link

GH-92495 is a backport of this pull request to the 3.11 branch.

@miss-islington
Copy link
Contributor

Sorry @gpshead, I had trouble checking out the 3.9 backport branch.
Please backport using cherry_picker on command line.
cherry_picker ebb37fc3fdcb03db4e206db017eeef7aaffbae84 3.9

@bedevere-bot bedevere-bot removed the needs backport to 3.11 only security fixes label May 8, 2022
gpshead added a commit to gpshead/cpython that referenced this pull request May 8, 2022
… fork method. (pythonGH-91598)

Do not spawn ProcessPool workers on demand when they spawn via fork.

This avoids potential deadlocks in the child processes due to forking from
a multithreaded process..
(cherry picked from commit ebb37fc)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
@bedevere-bot
Copy link

GH-92497 is a backport of this pull request to the 3.10 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.10 only security fixes label May 8, 2022
gpshead added a commit that referenced this pull request May 8, 2022
…method. (GH-91598) (#92497)

Do not spawn ProcessPool workers on demand when they spawn via fork.

This avoids potential deadlocks in the child processes due to forking from
a multithreaded process..
(cherry picked from commit ebb37fc)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request May 8, 2022
… fork method. (pythonGH-91598) (pythonGH-92497)

Do not spawn ProcessPool workers on demand when they spawn via fork.

This avoids potential deadlocks in the child processes due to forking from
a multithreaded process..
(cherry picked from commit ebb37fc)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
(cherry picked from commit b795376)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
gpshead added a commit that referenced this pull request May 8, 2022
…GH-91598) (#92495)

Do not spawn ProcessPool workers on demand when they spawn via fork.

This avoids potential deadlocks in the child processes due to forking from
a multithreaded process.
(cherry picked from commit ebb37fc)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
gpshead added a commit that referenced this pull request May 8, 2022
…method. (GH-91598) (GH-92497) (#92499)

Do not spawn ProcessPool workers on demand when they spawn via fork.

This avoids potential deadlocks in the child processes due to forking from
a multithreaded process..
(cherry picked from commit ebb37fc)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
(cherry picked from commit b795376)

Co-authored-by: Gregory P. Smith <greg@krypto.org>

Co-authored-by: Gregory P. Smith <greg@krypto.org>
hello-adam pushed a commit to hello-adam/cpython that referenced this pull request Jun 2, 2022
… fork method. (pythonGH-91598) (pythonGH-92497) (python#92499)

Do not spawn ProcessPool workers on demand when they spawn via fork.

This avoids potential deadlocks in the child processes due to forking from
a multithreaded process..
(cherry picked from commit ebb37fc)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
(cherry picked from commit b795376)

Co-authored-by: Gregory P. Smith <greg@krypto.org>

Co-authored-by: Gregory P. Smith <greg@krypto.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs backport to 3.9 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

concurrent.futures.ProcessPoolExecutor can deadlock when tcmalloc is used
5 participants