Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add symbol b'\xe2\x80\x93' to punctuation symbols #18244

Closed
wants to merge 1 commit into from

Conversation

sergei3000
Copy link

This symbol looks very similar to b'-', and isn't matched when using string.punctuation as reference

This symbol looks very similar to b'-', and isn't matched when using string.punctuation as reference
@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

Recognized GitHub username

We couldn't find a bugs.python.org (b.p.o) account corresponding to the following GitHub usernames:

@sergei3000

This might be simply due to a missing "GitHub Name" entry in one's b.p.o account settings. This is necessary for legal reasons before we can look at this contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@csabella
Copy link
Contributor

Please open a ticket on bugs.python.org for this issue and add the bpo number to the pull request title. Thank you!

@rpigott
Copy link
Contributor

rpigott commented Jan 30, 2020

string.punctuation excludes many characters in the unicode punctuation classes. I don't think it's meant to be comprehensive. For that reason, adding just U+2013 'EN DASH' makes little sense.

If you need to robustly match all punctuation characters, use the unicode category from unicodedata.category, or use an alternative regex module that implements this.

@bsolomon1124
Copy link
Contributor

Echoing @rpigott , string.punctuation is

String of ASCII characters which are considered punctuation characters in the C locale

Perhaps there is third-party module that is more comprehensive and focuses on the full Unicode table.

@csabella
Copy link
Contributor

Closing based on comments.

@csabella csabella closed this Mar 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants