Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-34003: Use dict instead of OrderedDict in csv.DictReader #8014

Merged
merged 5 commits into from
Jan 31, 2019

Conversation

selik
Copy link
Contributor

@selik selik commented Jun 29, 2018

DictReader can now use basic dicts instead of OrderedDict, as of version
3.7's change to specify that dict maintains keys in insertion order.
This will be more efficient and more pleasant to read at the interactive
prompt.

I also changed a list comprehension to a generator expression inside of
a ", ".join() to avoid the unnecessary list construction.

https://bugs.python.org/issue34003

DictReader can now use basic dicts instead of OrderedDict, as of version
3.7's change to specify that dict maintains keys in insertion order.
This will be more efficient and more pleasant to read at the interactive
prompt.

I also changed a list comprehension to a generator expression inside of
a ``", ".join()`` to avoid the unnecessary list construction.
@methane
Copy link
Member

methane commented Jun 30, 2018

I also changed a list comprehension to a generator expression inside of
a ", ".join() to avoid the unnecessary list construction.

FYI, str.join() creates temporary list if input is not list or tuple. You can't avoid list construction.
Changing list comprehension to genexp make it little slower, because of generator overhead.

$ python3 -m timeit '",".join(str(x) for x in range(10))'
100000 loops, best of 5: 3.46 usec per loop

$ python3 -m timeit '",".join([str(x) for x in range(10)])'
100000 loops, best of 5: 3.09 usec per loop

@selik
Copy link
Contributor Author

selik commented Jun 30, 2018

Doh. I should have checked the timings before I assumed. I'll revert that.

str.join creates a list internally if the input is not a list or tuple.
Passing a generator expression only adds overhead.
Copy link
Contributor

@BoboTiG BoboTiG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch :)

@rhettinger rhettinger merged commit 9f3f093 into python:master Jan 31, 2019
@selik selik deleted the fix-issue-34003 branch January 31, 2019 15:30
.. versionchanged:: 3.6
Returned rows are now of type :class:`OrderedDict`.
.. versionchanged:: 3.8
Returned rows are now of type :class:`dict`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entry should have been added without deleting the previous one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this never got fixed...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #20657 fixes this

merwok added a commit that referenced this pull request Jun 5, 2020
miss-islington pushed a commit that referenced this pull request Jun 10, 2020
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 10, 2020
Follow-up to pythonGH-8014
(cherry picked from commit 7aed052)

Co-authored-by: Éric Araujo <merwok@netwok.org>
miss-islington added a commit that referenced this pull request Jun 10, 2020
…H-20771)

Follow-up to GH-8014
(cherry picked from commit 7aed052)


Co-authored-by: Éric Araujo <merwok@netwok.org>

Automerge-Triggered-By: @merwok
miss-islington added a commit that referenced this pull request Jun 10, 2020
…H-20770)

Follow-up to GH-8014
(cherry picked from commit 7aed052)


Co-authored-by: Éric Araujo <merwok@netwok.org>

Automerge-Triggered-By: @merwok
arun-mani-j pushed a commit to arun-mani-j/cpython that referenced this pull request Jul 21, 2020
@nielskrijger
Copy link

It's a bit dissapointing that the return type of a function can suddenly change going from 3.7 to 3.8. I'll spare the details, but my assumptions about Python versioning were clearly wrong if this is allowed.

@merwok
Copy link
Member

merwok commented Aug 23, 2020

@nielskrijger the interface is 99% the same, and this is documented in the What’s New.

Major updates avoid gratuitous breakage, but have enhancements and fixes that require developers to test before updating.

@nielskrijger
Copy link

I assume with Major you mean a Minor (3.7 => 3.8 is not a major version).

The return value changing from OrderedDict to a normal dict is a loss of functionality; a dict and OrderedDict tend to have different characteristics https://stackoverflow.com/questions/50872498/will-ordereddict-become-redundant-in-python-3-7 , any code depending on those characteristics would break. I'm not sure how you calculcated this 99%, but if that means knowingly 1% of apps break because of this change I would have at least pointed it out as a breaking change.

Don't get me wrong, the change in itself makes perfect sense and is for the better. But I would definitely have marked this out as a breaking change.

@selik
Copy link
Contributor Author

selik commented Aug 23, 2020

@nielskrijger The mailing list is probably a better place to discuss the policy. I appreciate that you changed your argument from focusing on a type change to a change in functionality. Not everyone would agree, but much of Python's design cares more about interface than type (as evidenced by "magic" methods).

Calling it a breaking change, are you saying there should have been a deprecation cycle, or something else?

If you have a specific example of someone using OrderedDict.move_to_end in this context, I'd like to see it.

@merwok
Copy link
Member

merwok commented Aug 23, 2020

No, I meant 3.7 to 3.8. It is a major update, or a feature release if you prefer.
3.8.4 to 3.8.5 is a minor update that takes little effort to validate, but changing your runtime to 3.8 or to 3.9 requires more effort.

(We use major in the regular sense of important; for CPython it’s not very useful to refer to X.Y.Z as major.minor.micro, it’s more line (of development) . major (or feature) . micro (or patch))

@nielskrijger
Copy link

If you have a specific example of someone using OrderedDict.move_to_end in this context, I'd like to see it.

I didn't find this out by accident. If an app is primarily about uploading, parsing and generating csvs with fairly advanced customization, I hope it's clear something might break.

Calling it a breaking change, are you saying there should have been a deprecation cycle, or something else?

I could imagine a number of solutions, and in the mailing lists I do see some evidence of these being pointed out:

One of those suggestions was a configuration option; that would have been a quick fix on my end.

But I do feel silly, as this was obviously an argument that was indeed discussed and the risk considered neglible I assume.

No, I meant 3.7 to 3.8. It is a major update, [...] (We use major in the regular sense of important; for CPython it’s not very useful to refer to X.Y.Z as major.minor.micro, it’s more line (of development) . major (or feature) . micro (or patch))

Sorry, I didn't know that. I think I've got semver a bit too stuck in my mind I guess, you certainly helped me to point this out in Python's versioning. Thanks!

@methane
Copy link
Member

methane commented Aug 24, 2020

Note that we changed the returned type from dict to OrderedDict in Python 3.6 without deprecation period. It clearly breaks == behavior so it might be breaking change for very minor use cases.

We regularly change behavior which can be breaking change some people when:

  • Using deprecation period is not simple.
  • Estimated affected users is very small.

Ideally speaking, we should add OrderedDictReader instead of changing DictReader in Python 3.6.
But it is too late. Unnecessary breaking change already happened.

@merwok
Copy link
Member

merwok commented Aug 24, 2020

It clearly breaks == behavior

It doesn’t, comparing an ordered dict to a dict works (looks at the items as a set)

@methane
Copy link
Member

methane commented Aug 24, 2020

It doesn’t, comparing an ordered dict to a dict works (looks at the items as a set)

Consider comparing two csv files. When column order is changed but data is not changed, row1 == row2 in Python 3.5 and Python 3.8+, but row1 != row2 in Python 3.6, and Python 3.7.

It doesn't emit any error so user may not notice their scripts is broken in Python 3.6. It might be worse than missing .move_to_end() in Python 3.8, because user can notice the missing .move_to_end() when they are affected.

@merwok
Copy link
Member

merwok commented Aug 24, 2020

Sorry, I don’t follow what you are saying, and this is the wrong place for the discussion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants