Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-32236: Issue RuntimeWarning if buffering=1 for open() in binary mode #4842

Merged
merged 12 commits into from
Oct 20, 2018

Conversation

izbyshev
Copy link
Contributor

@izbyshev izbyshev commented Dec 13, 2017

If buffering=1 is specified for open() in binary mode, it is silently
treated as buffering=-1 (i.e., the default buffer size).
Coupled with the fact that line buffering is always supported in Python 2,
such behavior caused several issues (e.g., bpo-10344, bpo-21332).

Raise ValueError on attempts to use line buffering in binary mode to prevent
possible bugs and expose existing buggy code.

https://bugs.python.org/issue32236

If buffering=1 is specified for open() in binary mode, it is silently
treated as buffering=-1 (i.e., the default buffer size).
Coupled with the fact that line buffering is always supported in Python 2,
such behavior caused several issues (e.g., bpo-10344, bpo-21332).

Raise ValueError on attempts to use line buffering in binary mode to prevent
possible bugs and expose existing buggy code.
@izbyshev izbyshev changed the title bpo-32236: Don't allow buffering=1 for open() in binary mode bpo-32236: Issue RuntimeWarning if buffering=1 for open() in binary mode Dec 18, 2017
Copy link
Member

@gpshead gpshead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subprocess related changes are correct, others should review the other changes. Though at first glance I believe those to also be correct.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a NEWS entry to mention that open() now emits a warning if the file is opened in binary mode with buffering=1. Moreover, the NEWS entry should mention that the change fixes a performance issue in subprocess.Popen if buffering=1: the underlying binary file is no longer opened with inefficient buffering=1, if I understood correctly.

try:
f = self.open(TESTFN, 'wb', s)
f.write(str(s).encode("ascii"))
f.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the point of calling close() twice here? with should be used, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a refactor of an existing test.

except OSError as msg:
self.fail('error setting buffer size %d: %s' % (s, str(msg)))
self.assertEqual(d, s)
for s in (-1, 0, 512):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to check that these calls don't emit RuntimeWarning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the check.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

pitrou
pitrou previously requested changes Sep 11, 2018
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. You need to add a NEWS file using the blurb utility. Also, see comments below.

Lib/_pyio.py Outdated
@@ -198,6 +198,10 @@ def open(file, mode="r", buffering=-1, encoding=None, errors=None,
raise ValueError("binary mode doesn't take an errors argument")
if binary and newline is not None:
raise ValueError("binary mode doesn't take a newline argument")
if binary and buffering == 1:
import warnings
warnings.warn("line buffering is not supported in binary mode",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary obvious why the warning talks about line buffering, so I would be a bit more explicit: "line buffering (buffering = 1) isn't supported in binary mode, file will be fully buffered."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've clarified the message as you suggest.

try:
f = self.open(TESTFN, 'wb', s)
f.write(str(s).encode("ascii"))
f.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a refactor of an existing test.

@@ -591,15 +591,23 @@ def test_large_file_ops(self):
with self.open(support.TESTFN, "w+b") as f:
self.large_file_ops(f)

def _checked_open(self, path, mode, bufsize, under_check=False):
if not under_check and bufsize == 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is under_check for? It doesn't seem to be ever True.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function was a conditional context manager, and under_check is set to True in the recursive call. But I've removed that function anyway to have only one place with warning check (in test_file.py).

for s in (-1, 0, 512):
self._checkBufferSize(s)

# test that attempts to use line buffering in binary mode cause
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to test this both here and in test_io.py, IMHO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've preserved only one check here. I think that test_with_open in test_io.py wasn't a really good place for that check after all.

@izbyshev
Copy link
Contributor Author

@gpshead, @vstinner, @pitrou Thank you for your reviews! I have made the requested changes; please review again.

Also, should the docs be changed to reflect that buffering=1 is equivalent to buffering=-1 in binary mode? Given current docs, the effect of buffering=1 is unclear.

@bedevere-bot
Copy link

Thanks for making the requested changes!

@gpshead, @vstinner, @pitrou: please review the changes made to this pull request.

@izbyshev
Copy link
Contributor Author

@vstinner Regarding the NEWS, there was no performance issue with buffering=1 in binary mode because it was treated as buffering=-1. It was rather a potential correctness problem, especially for code that was originally written for Python 2 and that explicitly depends on line buffering.

@izbyshev
Copy link
Contributor Author

@vstinner Sorry, I misread your comment -- it was about subprocess.Popen. Since the underlying code simply calls io.open, which treats buffering=1 as buffering=-1 in binary mode, there is no performance issue here too.

My previous understanding of behavior of check_no_resource_warning(),
which check_no_warnings() is based on, was incorrect.
@izbyshev
Copy link
Contributor Author

@pitrou @vstinner Ping!

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is now way better, but I still have a few comments.

Lib/_pyio.py Outdated
if binary and buffering == 1:
import warnings
warnings.warn("line buffering (buffering=1) isn't supported in binary "
"mode, file will be fully buffered", RuntimeWarning, 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no such thing as "fully buffered". I propose to also say "the default size will be used" here.

Extract of the code:

        if buffering == 1 or buffering < 0 and raw.isatty():
            buffering = -1
            line_buffering = True
        if buffering < 0:
            buffering = DEFAULT_BUFFER_SIZE

buffering=1 becomes buffering=DEFAULT_BUFFER_SIZE if I understand correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

if self.text_mode:
if bufsize == 1:
line_buffering = True
# line buffering is not supported in binary mode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But here we are in text mode, no? The comment is misleading, I suggest to remove it.

Or say something like: "# Use the default buffer size for the binary buffered reader/writer".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to the binary mode used for the underlying streams, and I didn't realize that it can be misleading, but now I see it. Changed to clearer wording, thanks!

if (binary && buffering == 1) {
if (PyErr_WarnEx(PyExc_RuntimeWarning,
"line buffering (buffering=1) isn't supported in "
"binary mode, file will be fully buffered", 1) < 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto: say something like "the default size will be used".

PEP 7: please add { ... } around the "goto error;". By the way, see https://nakedsecurity.sophos.com/2014/02/24/anatomy-of-a-goto-fail-apples-ssl-bug-explained-plus-an-unofficial-patch/ :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm aware of PEP 7 and goto fail bug, but I usually follow the rule of local consistency, and a similar code above doesn't use brackets. Now I've added brackets here as you suggested.

@@ -0,0 +1,2 @@
Warn that line buffering is not supported if :func:`open` is called with
binary mode and ``buffering=1``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to be more explicit and list more impact functions: open, io.open, codecs.open. Would it be possible to put functions first in the sentence? Something like: "open(), io.open(), codecs.open() now emit a warning if the file is opened in binary mode with buffering=1." So the reader directly gets the context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other functions are impacted too, like pathlib.Path.open() and tempfile.TemporaryFile() (and some other classes from tempfile). Do you think we should find and list all such functions here? Maybe we should just provide a couple of examples: io.open and the standard library functions relying on it (e.g., codecs.open, pathlib.Path.open) now issue a warning if a file is opened in binary mode with buffering=1.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes, you will be put in the comfy chair!

@vstinner vstinner dismissed pitrou’s stale review October 20, 2018 00:15

The PR has been updated

@vstinner
Copy link
Member

I ran the full test suite on this change using "PYTHONWARNINGS=error ./python -W error -m test -j0 -u all,-gui -r" to check if any stdlib module misuse open(). Good news: "Tests result: SUCCESS".

@vstinner
Copy link
Member

Thanks @izbyshev for your tenacity :-) Thanks also @gpshead and @pitrou for your reviews.

@izbyshev izbyshev deleted the bpo-32236 branch November 3, 2018 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants