-
-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-32236: Issue RuntimeWarning if buffering=1 for open() in binary mode #4842
Conversation
If buffering=1 is specified for open() in binary mode, it is silently treated as buffering=-1 (i.e., the default buffer size). Coupled with the fact that line buffering is always supported in Python 2, such behavior caused several issues (e.g., bpo-10344, bpo-21332). Raise ValueError on attempts to use line buffering in binary mode to prevent possible bugs and expose existing buggy code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The subprocess related changes are correct, others should review the other changes. Though at first glance I believe those to also be correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a NEWS entry to mention that open() now emits a warning if the file is opened in binary mode with buffering=1. Moreover, the NEWS entry should mention that the change fixes a performance issue in subprocess.Popen if buffering=1: the underlying binary file is no longer opened with inefficient buffering=1, if I understood correctly.
try: | ||
f = self.open(TESTFN, 'wb', s) | ||
f.write(str(s).encode("ascii")) | ||
f.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the point of calling close() twice here? with should be used, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a refactor of an existing test.
except OSError as msg: | ||
self.fail('error setting buffer size %d: %s' % (s, str(msg))) | ||
self.assertEqual(d, s) | ||
for s in (-1, 0, 512): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to check that these calls don't emit RuntimeWarning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the check.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. You need to add a NEWS file using the blurb utility. Also, see comments below.
Lib/_pyio.py
Outdated
@@ -198,6 +198,10 @@ def open(file, mode="r", buffering=-1, encoding=None, errors=None, | |||
raise ValueError("binary mode doesn't take an errors argument") | |||
if binary and newline is not None: | |||
raise ValueError("binary mode doesn't take a newline argument") | |||
if binary and buffering == 1: | |||
import warnings | |||
warnings.warn("line buffering is not supported in binary mode", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessary obvious why the warning talks about line buffering, so I would be a bit more explicit: "line buffering (buffering = 1) isn't supported in binary mode, file will be fully buffered."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I've clarified the message as you suggest.
try: | ||
f = self.open(TESTFN, 'wb', s) | ||
f.write(str(s).encode("ascii")) | ||
f.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a refactor of an existing test.
Lib/test/test_io.py
Outdated
@@ -591,15 +591,23 @@ def test_large_file_ops(self): | |||
with self.open(support.TESTFN, "w+b") as f: | |||
self.large_file_ops(f) | |||
|
|||
def _checked_open(self, path, mode, bufsize, under_check=False): | |||
if not under_check and bufsize == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is under_check
for? It doesn't seem to be ever True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function was a conditional context manager, and under_check
is set to True
in the recursive call. But I've removed that function anyway to have only one place with warning check (in test_file.py).
for s in (-1, 0, 512): | ||
self._checkBufferSize(s) | ||
|
||
# test that attempts to use line buffering in binary mode cause |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to test this both here and in test_io.py
, IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I've preserved only one check here. I think that test_with_open
in test_io.py
wasn't a really good place for that check after all.
@gpshead, @vstinner, @pitrou Thank you for your reviews! I have made the requested changes; please review again. Also, should the docs be changed to reflect that |
@vstinner Regarding the NEWS, there was no performance issue with |
@vstinner Sorry, I misread your comment -- it was about |
My previous understanding of behavior of check_no_resource_warning(), which check_no_warnings() is based on, was incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is now way better, but I still have a few comments.
Lib/_pyio.py
Outdated
if binary and buffering == 1: | ||
import warnings | ||
warnings.warn("line buffering (buffering=1) isn't supported in binary " | ||
"mode, file will be fully buffered", RuntimeWarning, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no such thing as "fully buffered". I propose to also say "the default size will be used" here.
Extract of the code:
if buffering == 1 or buffering < 0 and raw.isatty():
buffering = -1
line_buffering = True
if buffering < 0:
buffering = DEFAULT_BUFFER_SIZE
buffering=1 becomes buffering=DEFAULT_BUFFER_SIZE if I understand correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks!
Lib/subprocess.py
Outdated
if self.text_mode: | ||
if bufsize == 1: | ||
line_buffering = True | ||
# line buffering is not supported in binary mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But here we are in text mode, no? The comment is misleading, I suggest to remove it.
Or say something like: "# Use the default buffer size for the binary buffered reader/writer".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to the binary mode used for the underlying streams, and I didn't realize that it can be misleading, but now I see it. Changed to clearer wording, thanks!
Modules/_io/_iomodule.c
Outdated
if (binary && buffering == 1) { | ||
if (PyErr_WarnEx(PyExc_RuntimeWarning, | ||
"line buffering (buffering=1) isn't supported in " | ||
"binary mode, file will be fully buffered", 1) < 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto: say something like "the default size will be used".
PEP 7: please add { ... } around the "goto error;". By the way, see https://nakedsecurity.sophos.com/2014/02/24/anatomy-of-a-goto-fail-apples-ssl-bug-explained-plus-an-unofficial-patch/ :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm aware of PEP 7 and goto fail
bug, but I usually follow the rule of local consistency, and a similar code above doesn't use brackets. Now I've added brackets here as you suggested.
@@ -0,0 +1,2 @@ | |||
Warn that line buffering is not supported if :func:`open` is called with | |||
binary mode and ``buffering=1``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to be more explicit and list more impact functions: open, io.open, codecs.open. Would it be possible to put functions first in the sentence? Something like: "open(), io.open(), codecs.open() now emit a warning if the file is opened in binary mode with buffering=1." So the reader directly gets the context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some other functions are impacted too, like pathlib.Path.open()
and tempfile.TemporaryFile()
(and some other classes from tempfile
). Do you think we should find and list all such functions here? Maybe we should just provide a couple of examples: io.open
and the standard library functions relying on it (e.g., codecs.open
, pathlib.Path.open
) now issue a warning if a file is opened in binary mode with buffering=1
.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase And if you don't make the requested changes, you will be put in the comfy chair! |
I ran the full test suite on this change using "PYTHONWARNINGS=error ./python -W error -m test -j0 -u all,-gui -r" to check if any stdlib module misuse open(). Good news: "Tests result: SUCCESS". |
If buffering=1 is specified for open() in binary mode, it is silently
treated as buffering=-1 (i.e., the default buffer size).
Coupled with the fact that line buffering is always supported in Python 2,
such behavior caused several issues (e.g., bpo-10344, bpo-21332).
Raise ValueError on attempts to use line buffering in binary mode to prevent
possible bugs and expose existing buggy code.
https://bugs.python.org/issue32236