Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-38091: Import deadlock detection causes deadlock #17518

Merged
merged 7 commits into from
Mar 3, 2020

Conversation

arigo
Copy link
Contributor

@arigo arigo commented Dec 9, 2019

@pitrou
Copy link
Member

pitrou commented Dec 21, 2019

The fix here is more of a hack. A proper fix could be something like the below. At least it fixes the reproducer posted by @rlamy . Is there a real-world reproducer you can test it on?

diff --git a/Lib/importlib/_bootstrap.py b/Lib/importlib/_bootstrap.py
index 8de0e9ee79..7b2bfefdec 100644
--- a/Lib/importlib/_bootstrap.py
+++ b/Lib/importlib/_bootstrap.py
@@ -71,9 +71,20 @@ class _ModuleLock:
             lock = _blocking_on.get(tid)
             if lock is None:
                 return False
-            tid = lock.owner
-            if tid == me:
+            owner_tid = lock.owner
+            if owner_tid == tid:
+                # Lock owner is temporarily blocking on a lock it
+                # already owns.  This means it's not actually blocking.
+                # (this can happen at the beginning of acquire() below
+                #  -- see bpo 38091)
+                return False
+            if owner_tid == me:
+                # Lock owner is blocking on a lock we own, but we're blocking
+                # on the lock it owns => deadlock.
                 return True
+            # Move along the chain and find out whether the owner is blocking
+            # on another lock.
+            tid = owner_tid
 
     def acquire(self):
         """

@arigo
Copy link
Contributor Author

arigo commented Dec 22, 2019

Maybe a safer way to think about it would be to ask ourselves when has_deadlock() is really supposed to return True. I think it means to return True if the chain _blocking_on[_blocking_on[...[tid]...]] eventually reaches the value me, and False in all other cases. You are fixing one such case by detecting when the chain enters a fixpoint, i.e. a loop of length 1. I'm not convinced that longer loops are impossible; maybe we should also detect them and return False in these cases. Something like that (untested code):

     def has_deadlock(self):
         # Deadlock avoidance for concurrent circular imports.
         me = _thread.get_ident()
         tid = self.owner
+        seen = set()
         while True:
             lock = _blocking_on.get(tid)
             if lock is None:
                 return False
             tid = lock.owner
             if tid == me:
                 return True
+            if tid in seen:
+                # bpo 38091: the chain of tid's we encounter here
+                # eventually leads to a fixpoint or a cycle, but
+                # does not reach 'me'.  This means we would not
+                # actually deadlock.  This can happen if other
+                # threads are at the beginning of acquire() below.
+                return False    
+            seen.add(tid)

@pitrou
Copy link
Member

pitrou commented Dec 22, 2019

@arigo Yes, that fix would probably work. Perhaps @rlamy can check :-)

@rlamy
Copy link
Contributor

rlamy commented Dec 23, 2019

I think that this should fix the issue. It caused some intermittent test failures in PyPy, so I guess we should just merge the change there and monitor the CI results.

@pitrou
Copy link
Member

pitrou commented Jan 1, 2020

@rlamy Did you test this fix on PyPy?

Copy link
Member

@brettcannon brettcannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking as rejected until we here back from PyPy that the proposed fix in the PR discussion worked for them.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@arigo
Copy link
Contributor Author

arigo commented Jan 23, 2020

I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@brettcannon: please review the changes made to this pull request.

@arigo
Copy link
Contributor Author

arigo commented Jan 23, 2020

I have updated the PR to include the test from https://bugs.python.org/issue38091 , which fails (deadlocks) without the fix and passes with the fix.

@arigo arigo reopened this Jan 23, 2020
Copy link
Member

@brettcannon brettcannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

importlib.h wasn't regenerated and included in this PR. I believe that can be done with make regen-all (or something like that).

@bedevere-bot
Copy link

Thanks for making the requested changes!

@brettcannon: please review the changes made to this pull request.

@codecov
Copy link

codecov bot commented Feb 15, 2020

Codecov Report

Merging #17518 into master will increase coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #17518     +/-   ##
=========================================
  Coverage   82.11%   82.12%             
=========================================
  Files        1955     1954      -1     
  Lines      588906   584027   -4879     
  Branches    44429    44458     +29     
=========================================
- Hits       483572   479607   -3965     
+ Misses      95679    94777    -902     
+ Partials     9655     9643     -12     
Impacted Files Coverage Δ
Lib/distutils/tests/test_bdist_rpm.py 30.00% <0.00%> (-65.00%) ⬇️
Lib/distutils/command/bdist_rpm.py 7.63% <0.00%> (-56.88%) ⬇️
Lib/test/test_urllib2net.py 76.92% <0.00%> (-13.85%) ⬇️
Lib/test/test_smtpnet.py 78.57% <0.00%> (-7.15%) ⬇️
Lib/ftplib.py 63.85% <0.00%> (-6.06%) ⬇️
Lib/test/test_ftplib.py 87.11% <0.00%> (-4.72%) ⬇️
Tools/scripts/db2pickle.py 17.82% <0.00%> (-3.97%) ⬇️
Tools/scripts/pickle2db.py 16.98% <0.00%> (-3.78%) ⬇️
Lib/test/test_socket.py 71.94% <0.00%> (-3.77%) ⬇️
Lib/distutils/tests/test_bdist_msi.py 56.25% <0.00%> (-3.75%) ⬇️
... and 359 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bec4186...1b2dbe4. Read the comment docs.

@miss-islington
Copy link
Contributor

@arigo: Status check is done, and it's a failure ❌ .

@miss-islington
Copy link
Contributor

@arigo: Status check is done, and it's a success ✅ .

@miss-islington miss-islington merged commit 6daa37f into python:master Mar 3, 2020
@miss-islington
Copy link
Contributor

Thanks @arigo for the PR 🌮🎉.. I'm working now to backport this PR to: 3.7, 3.8.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry, @arigo, I could not cleanly backport this to 3.8 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker 6daa37fd42c5d5300172728e8b4de74fe0b319fc 3.8

@miss-islington miss-islington self-assigned this Mar 3, 2020
@miss-islington
Copy link
Contributor

Sorry @arigo, I had trouble checking out the 3.7 backport branch.
Please backport using cherry_picker on command line.
cherry_picker 6daa37fd42c5d5300172728e8b4de74fe0b319fc 3.7

@brettcannon
Copy link
Member

@arigo any interest in manually backporting this to 3.8 and 3.7? (Chances are it's importlib.h.)

@arigo
Copy link
Contributor Author

arigo commented Mar 5, 2020

My git-fu is very limited (in general and on github). Do you expect two other pull requests for the 3.7 and 3.8 branches?

@brettcannon
Copy link
Member

@arigo yep, but don't worry about it too much. I can try to get around to it eventually myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs backport to 3.8 only security fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants