Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-38056: overhaul Error Handlers section in codecs documentation #15732

Merged
merged 20 commits into from May 9, 2022
Merged

bpo-38056: overhaul Error Handlers section in codecs documentation #15732

merged 20 commits into from May 9, 2022

Conversation

ghost
Copy link

@ghost ghost commented Sep 8, 2019

  • replace/backslashreplace/surrogateescape were wrongly described as encoding only, in fact they can also be used in decoding.
  • clarify the description of surrogatepass.
  • add more descriptions to each handler.
  • add two REPL examples.
  • add indexes for Error Handler's name.
  • add default parameter values in codecs.rst
  • improve term "text encoding".

I read the source code, hope there is no mistake and misleading.

The indexes were tested in .chm documentation.

The term "text encoding" appeared 38 times in the entire document, the change in glossary.rst should be correct.

Also verified these terms with Wikipedia and Google:

  1. XML/HTML numeric character reference
  2. Name property from Unicode Character Database

October 2nd effect:

https://bugs.python.org/issue38056

Copy link
Contributor

@aeros aeros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @animalize.

As someone who is not particularly experienced with using the codec module (for my purposes, I've never had much of a need for it), these examples are quite helpful for visually representing the purpose of each of the error handlers.

I'm not entirely certain that we need to provide an example for all four of them, but in general I think that providing an example or two would be very helpful to readers.

@aeros
Copy link
Contributor

aeros commented Sep 9, 2019

If these changes are approved, I think it would be appropriate to backport the changes to the 3.7 and 3.8 docs. Also, I've added a skip news label. You can add one if you'd like to, but it shouldn't be required for adding examples to the documentation.

I'll cc the experts for codecs.

/cc @malemburg @doerwalter

@ghost
Copy link
Author

ghost commented Sep 9, 2019

I'm not entirely certain that we need to provide an example for all four of them, but in general I think that providing an example or two would be very helpful to readers.

Thanks for your review.
My idea is to demonstrate these things:

  • appropriate XML character reference
  • backslashed escape sequences
  • \N{...} escape sequences

So I would like to include at least these three.

I prefered to use this example, it covers a non-BMP character 🏫.

>>> '42 µ ♪ 🏫'.encode(encoding='ascii', errors='replace')
b'42 ? ? ?'
>>> '42 µ ♪ 🏫'.encode(encoding='ascii', errors='xmlcharrefreplace')
b'42 µ ♪ 🏫'
>>> '42 µ ♪ 🏫'.encode(encoding='ascii', errors='backslashreplace')
b'42 \\xb5 \\u266a \\U0001f3eb'
>>> '42 µ ♪ 🏫'.encode(encoding='ascii', errors='namereplace')
b'42 \\N{MICRO SIGN} \\N{EIGHTH NOTE} \\N{SCHOOL}'

But if paste the statement into the REPL on Windows, the non-BMP character is broken, and get an invalid result \\ufffd. Maybe this will confuse the user, so I abandoned this example.

>>> '42 µ ♪ �'.encode(encoding='ascii', errors='backslashreplace')
b'42 \\xb5 \\u266a \\ufffd'

@ghost ghost changed the title bpo-38056: Add examples for text encoding Error Handlers [WIP] bpo-38056: Add examples for text encoding Error Handlers Sep 11, 2019
@ghost ghost changed the title [WIP] bpo-38056: Add examples for text encoding Error Handlers [WIP] bpo-38056: Overhaul doc of text encoding Error Handlers Sep 11, 2019
@ghost ghost changed the title [WIP] bpo-38056: Overhaul doc of text encoding Error Handlers bpo-38056: Overhaul doc of text encoding Error Handlers Sep 11, 2019
@ghost ghost changed the title bpo-38056: Overhaul doc of text encoding Error Handlers [WIP] bpo-38056: Overhaul doc of text encoding Error Handlers Sep 11, 2019
@aeros
Copy link
Contributor

aeros commented Sep 11, 2019

Not that I'm opposed to the changes, but I think the examples alone would be significantly easier and less controversial to review. Also, it's usually more efficient from a workflow perspective to have more multiple focused PRs than a single large overhaul at once.

@ghost
Copy link
Author

ghost commented Sep 11, 2019

Not that I'm opposed to the changes, but I think the examples alone would be significantly easier and less controversial to review. Also, it's usually more efficient from a workflow perspective to have more multiple focused PRs than a single large overhaul at once.

Ah, just like suddenly getting excited for this PR. 😄

There are obvious errors, for example the backslashreplace error handler works with decoding since Python 3.5, but it is still in "encode only table".

replace error handler also works with both encode and decode codecs.

@ghost
Copy link
Author

ghost commented Sep 11, 2019

If codec experts want to do this, or who thinks he/she can do better than me, free feel to create a new PR.

@aeros
Copy link
Contributor

aeros commented Sep 11, 2019

@animalize:

If codec experts want to do this, or who thinks he/she can do better than me, free feel to create a new PR.

I'm not recommending for anyone else to do the other parts of the current one, you can still potentially apply the all of the changes. I just mean having only the examples in this PR and then splitting the other sections into their own PRs. As the scope of a PR gets larger, the time required to fully review it prior to a merge significantly increases. From my experience, the core developers tend to prefer smaller and more focused PRs when it's possible.

Copy link
Member

@JulienPalard JulienPalard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure reorganizing the table is worth it, what did you try to enhance by replacing the separation "all codecs vs text codecs" by the separation "both encode/decode vs encode only"?

Doc/library/codecs.rst Show resolved Hide resolved
Doc/library/codecs.rst Outdated Show resolved Hide resolved
@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

* Some handlers were wrongly described as text-encoding only, but actually they can also be used in text-decoding.
* Add more description to each handler.
* Add two REPL examples.
* Add indexes for Error Handler's name.
@ghost ghost changed the title [WIP] bpo-38056: Overhaul doc of text encoding Error Handlers [WIP] bpo-38056: overhaul Error Handlers section in codecs documentation Sep 14, 2019
@JulienPalard JulienPalard dismissed their stale review September 17, 2019 21:28

Author pushed changes.

Copy link
Member

@JulienPalard JulienPalard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with nits, but not codec expert here.

Doc/glossary.rst Outdated Show resolved Hide resolved
@ghost
Copy link
Author

ghost commented Sep 18, 2019

LGTM with nits, but not codec expert here.

I am reading the source code these days, hope there will be no mistakes.

This PR bases on #15135.
We need to wait for it to be backported to 3.7/3.8 branches.

Ma Lin and others added 4 commits May 9, 2022 07:47
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
@JelleZijlstra JelleZijlstra merged commit 5bc2390 into python:main May 9, 2022
@JelleZijlstra JelleZijlstra added needs backport to 3.9 only security fixes needs backport to 3.10 only security fixes needs backport to 3.11 only security fixes labels May 9, 2022
@miss-islington
Copy link
Contributor

Thanks @animalize for the PR, and @JelleZijlstra for merging it 🌮🎉.. I'm working now to backport this PR to: 3.9.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Thanks @animalize for the PR, and @JelleZijlstra for merging it 🌮🎉.. I'm working now to backport this PR to: 3.11.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Thanks @animalize for the PR, and @JelleZijlstra for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10.
🐍🍒⛏🤖

@bedevere-bot
Copy link

GH-92526 is a backport of this pull request to the 3.9 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.9 only security fixes label May 9, 2022
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request May 9, 2022
…ythonGH-15732)

* Some handlers were wrongly described as text-encoding only, but actually they can also be used in text-decoding.
* Add more description to each handler.
* Add two REPL examples.
* Add indexes for Error Handler's name.

Co-authored-by: Kyle Stanley <aeros167@gmail.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
(cherry picked from commit 5bc2390)

Co-authored-by: Ma Lin <animalize@users.noreply.github.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request May 9, 2022
…ythonGH-15732)

* Some handlers were wrongly described as text-encoding only, but actually they can also be used in text-decoding.
* Add more description to each handler.
* Add two REPL examples.
* Add indexes for Error Handler's name.

Co-authored-by: Kyle Stanley <aeros167@gmail.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
(cherry picked from commit 5bc2390)

Co-authored-by: Ma Lin <animalize@users.noreply.github.com>
@bedevere-bot
Copy link

GH-92527 is a backport of this pull request to the 3.11 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.11 only security fixes label May 9, 2022
@bedevere-bot
Copy link

GH-92528 is a backport of this pull request to the 3.10 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.10 only security fixes label May 9, 2022
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request May 9, 2022
…ythonGH-15732)

* Some handlers were wrongly described as text-encoding only, but actually they can also be used in text-decoding.
* Add more description to each handler.
* Add two REPL examples.
* Add indexes for Error Handler's name.

Co-authored-by: Kyle Stanley <aeros167@gmail.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
(cherry picked from commit 5bc2390)

Co-authored-by: Ma Lin <animalize@users.noreply.github.com>
miss-islington added a commit that referenced this pull request May 9, 2022
…H-15732)

* Some handlers were wrongly described as text-encoding only, but actually they can also be used in text-decoding.
* Add more description to each handler.
* Add two REPL examples.
* Add indexes for Error Handler's name.

Co-authored-by: Kyle Stanley <aeros167@gmail.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
(cherry picked from commit 5bc2390)

Co-authored-by: Ma Lin <animalize@users.noreply.github.com>
miss-islington added a commit that referenced this pull request May 9, 2022
…H-15732)

* Some handlers were wrongly described as text-encoding only, but actually they can also be used in text-decoding.
* Add more description to each handler.
* Add two REPL examples.
* Add indexes for Error Handler's name.

Co-authored-by: Kyle Stanley <aeros167@gmail.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
(cherry picked from commit 5bc2390)

Co-authored-by: Ma Lin <animalize@users.noreply.github.com>
miss-islington added a commit that referenced this pull request May 9, 2022
…H-15732)

* Some handlers were wrongly described as text-encoding only, but actually they can also be used in text-decoding.
* Add more description to each handler.
* Add two REPL examples.
* Add indexes for Error Handler's name.

Co-authored-by: Kyle Stanley <aeros167@gmail.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
(cherry picked from commit 5bc2390)

Co-authored-by: Ma Lin <animalize@users.noreply.github.com>
@ghost ghost deleted the error_handlers branch May 9, 2022 03:32
@ghost
Copy link
Author

ghost commented May 9, 2022

Thanks for your reviews, which has led to these revisions being improved in every aspects.

I just didn't add more examples, but I think too many examples will disturb the readers.

hello-adam pushed a commit to hello-adam/cpython that referenced this pull request Jun 2, 2022
…ythonGH-15732)

* Some handlers were wrongly described as text-encoding only, but actually they can also be used in text-decoding.
* Add more description to each handler.
* Add two REPL examples.
* Add indexes for Error Handler's name.

Co-authored-by: Kyle Stanley <aeros167@gmail.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
(cherry picked from commit 5bc2390)

Co-authored-by: Ma Lin <animalize@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir skip news
Projects
None yet
Development

Successfully merging this pull request may close these issues.