Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AttrDict to JSON module for use with object_hook #96145

Closed
rhettinger opened this issue Aug 20, 2022 · 14 comments
Closed

Add AttrDict to JSON module for use with object_hook #96145

rhettinger opened this issue Aug 20, 2022 · 14 comments
Labels
release-blocker type-feature A feature request or enhancement

Comments

@rhettinger
Copy link
Contributor

rhettinger commented Aug 20, 2022

AttrDict is used improve the readability of nested data access from JSON files. Compare:

 config.servers.setup.interfaces.mgmt.protocol
 config["servers"]["setup"]["interfaces"]["mgmt"]["protocol"]

Victor's research showed that tooling this is very popular in Python world and the implementation dates back quite far. At first I suggested I suggested extending SimpleNamespace(); instead, Guido recommended putting this is the JSON module which is where it would primarily be used:

with open('kepler.json') as f:
    kepler = json.load(f, object_hook=AttrDict)
print(kepler.orbital_period.neptune)

Linked PRs

@JelleZijlstra
Copy link
Member

I'd like to propose reverting this change and removing AttrDict from 3.12.

  1. The feature didn't receive proper discussion. It was open for only a few days, with no input from others and no review on the PR. There was an old mailing list thread, but that thread didn't end in any consensus.

  2. The implementation has surprising edge cases.

>>> d=json.AttrDict()
>>> d.pop
<built-in method pop of AttrDict object at 0x1046ee640>
>>> d.pop = 3
>>> d.pop
<built-in method pop of AttrDict object at 0x1046ee640>

Some of this is called out in the documentation, but this sample feels weird enough to me that I don't want this behavior in the standard library.

  1. More generally, attribute access and subscripting are different things in Python. They work on different objects and come with different sets of expectations. An object that allows both attribute access and subscripting to do the same thing muddies that difference and makes the language less consistent.

@JelleZijlstra JelleZijlstra reopened this Jun 20, 2023
@carljm
Copy link
Member

carljm commented Jun 20, 2023

@serhiy-storchaka pointed out in the linked mailing list thread that there is an existing way to do this in the standard library:

>>> import json
>>> from types import SimpleNamespace
>>> data = '{"foo": {"bar": "val"}}'
>>> obj = json.loads(data, object_hook=lambda x: SimpleNamespace(**x))
>>> obj.foo.bar
'val'

This doesn't require any new type of object that combines attribute and item access into the same thing; instead, it explicitly declares that you want to represent each JSON object as a SimpleNamespace (which has no methods to conflict with data members) and use attribute access (only).

He also pointed out that making this pattern simpler to use (no lambda) does not require making SimpleNamespace expose the full mapping API (as proposed in https://bugs.python.org/issue40284), it would just require adding support for SimpleNamespace constructor to take a single positional mapping argument, which is a small change.

Not sure why those suggestions were ignored at the time, but they seem preferable to adding a new type that mixes attribute and item access.

@ambv
Copy link
Contributor

ambv commented Jun 20, 2023

Given that the PR was created on a Saturday and merged the following Tuesday in August, which is generally a time of lower activity for us, I'm inclined to postpone this addition until we gather more consensus.

The idea isn't new but there's always some edge cases that made the feature controversial to endorse within the standard library. I concur that it's not obvious agreement was reached on this. +1 to reverting.

@erlend-aasland
Copy link
Contributor

Perhaps it would be better fit as a "recipe" in the JSON documentation, for example using Serhiy's suggestion.

@vstinner
Copy link
Member

vstinner commented Jun 20, 2023

The topic was discussed in 2020, but at the beginning, the idea was to change types.SimpleNamespace API:

In 2020, I listed these examples of existing solutions:

  • attrdict: "mapping objects that allow their elements to be accessed both as keys and as attributes"
  • dictlib: "Dictionary keys as object attributes"
  • objdict: "An ObjDict supports all normal dict operations, but adds support for accessing and setting entries as attributes"
  • mydict: "dict subclass which tries to act like JavaScript objects, so you can use the dot notation (d.foo) to access members of the object"
  • dpath: "accessing and searching dictionaries via /slashed/paths ala xpath". Example: dpath.util.search(x, "a/b/[cd]")
  • glom: more advanced tool

@rhettinger
Copy link
Contributor Author

rhettinger commented Jun 21, 2023

I'll just say that this code or some variations of it are fairly well established in the Python ecosystem. Many of my customers has this in their code bases. The idea has been around since the first edition of the Python Cookbook. It solves are real problem for many users. It is in use in major products such as Cisco's PyATS. A Google search for "AttrDict" turns up many hits — this isn't new tech. The code does exactly what is purports to do and it well tested. It is not broken and will never be any better than it is now. There is really no point in reverting it other than a purity argument against a common practice.

@encukou
Copy link
Member

encukou commented Jun 21, 2023

this code or some variations of it

I would expect a PEP to review these variations and analyze the trade-offs involved.
If there are dozens of project-specific variations of an idea doesn't necessarily mean that one is a good fit for stdlib.
Cf. the identiity function feature request. Why is it not better here to let people write their own wrapper, and decide what pop should be if there's a conflict, or whether they want mapping API at all?

@hwmq
Copy link

hwmq commented Jun 21, 2023

Has there been a proposition before for string-based attribute access syntax? e.g. obj.'attr-with-dashes'.f'{val + 1}'.my_attr For all I know there are intense parsing difficulties or flat out incompatibilites with this syntax.

That combined with the ability to control the behaviour when attributes and subscripts are conflated (essentially which to prioritize), would make this a pretty solid feature maybe?

@CAM-Gerlach
Copy link
Member

CAM-Gerlach commented Jun 21, 2023

Has there been a proposition before for string-based attribute access syntax? e.g. obj.'attr-with-dashes'.f'{val + 1}'.my_attr

This would be a subject for a Ideas thread on the Python Discourse rather than a discussion here, on this issue proposing a different feature (which was previously discussed on the old python-ideas without clear consensus, and should have been discussed further in a similar venue).

To note, though, all that really solves is allowing use of keys that aren't valid Python identifiers (not allowing dotted access to which is a conscious design choice), but not the other issues here, e.g. conflicting with existing dict attributes, and requires new syntax, which entails the highest bar there is for accepting a change to Python.

@rhettinger
Copy link
Contributor Author

The specification is simple. First, AttrDict must be fully substitutable for a regular dict — all the operators and methods should work normally. Second, optional attribute access is allowed with the keys that are valid identifiers, not a keyword, and not a dict method.

AFAICT there aren’t any “corner cases” in that spec. The attribute lookup requirement of being a valid identifier and not a keyword is the same as for dataclasses and Enum. The additional requirement of not being dict method is necessary to meet the first part of the specification — I don’t think there is any “wiggle room” here.

The implementation is dirt simple with only a dozen lines of straight-forward code. There tests verify the specification.

I expect AttrDict will only be used by people who actually need it, for example those of us who work with heavily nested JSON or YAML with a fixed schema. We don’t use it for a “minor convenience”. It is done for readability. Here are real-world examples from PyATS here and here:

testbed.servers.tftp.address
testbed.password.tacoa
device.connects.a.protocol
devices.jarvis.connections.voice.protocol

For some commenters in the thread, this seems to be a new idea. In fact, this has been in use for two decades and has been reinvented many times because of the awkwardness of the dict API for heavily nested data. Pandas does something similar because it is so useful for our end users who have to wrangle complex datasets.

@CAM-Gerlach
Copy link
Member

Pandas does something similar because it is so useful for our end users who have to wrangle complex datasets.

Some perspective on this from the PyData world as a heavy pandas user from the start of my Python career, and more recently advisor, mentor, tutor and instructor of students and researchers using it, as well as a maintainer of a number of scientific packages that rely on it:

Personally, pretty early on I learned to avoid it, never use it in my code nowadays, and strongly advise students to do the same. As the many warnings in the Pandas docs illustrate, it has numerous caveats—most notably, the same two mentioned here: clashes with method names and attribute names, and incompatibility with column names that aren't valid Python identifiers—which can lead to inconsistent usage, extra cognitive load, and even non-obvious bugs (especially for newer users). Furthermore, I've found it significantly harms readability, as it makes it much more difficult to distinguish indexes from methods/attributes (unlike all other indexing methods)—also relevant here. On the whole, the substantial extra long-term cognitive complexity it adds outweighs any minor short-term convenience.

While this is occasionally used, it isn't that common in my own experience; basic getitem access is the most common, especially in student code, while for more complex use cases and by more advanced users and teachers, .loc and less often .iloc are the tools of choice. I typically only see dotted attribute access used by beginners who aren't yet as comfortable with the more capable methods of indexing, or because they cargo-culted other code that used it, and relatively rarely see more experienced users employ it, given its numerous caveats and readability issues.

@rhettinger
Copy link
Contributor Author

This is contrary to my experience and to our shared experience with dataclasses, named tuples, Enum, and SimpleNamespace. There is ample precedent for attribute lookup. With AttrDict, all that is being added beyond SimpleNamespace is dict method support so that we get full substitutability with regular dicts.

AFAICT the only interesting area is the 11 non-dunder dictionary methods. In practice, we rarely encounter those names as keys and when we do it is trivially easy to work around it by falling back to square brackets as shown in the documentation.

Also, it is not necessary for everyone of us to like or prefer every tool offered in the standard library. I personally never use SingleDispatch but acknowledge the some users prefer that solution to their problems. The same thought applies here. People dealing with heavily nested JSON may or may not prefer this option to improve the readability of their code. Because of my involvement with PyATS, I know for certain that this style of access was chosen to improve readability and that there have been no issues with it in practice.

@Yhg1s
Copy link
Member

Yhg1s commented Jun 26, 2023

To me (as Core Dev and RM) there are several substantial problems here.

The AttrDict design is inherently problematic, for starters, by subclassing dict. Subclassing dict is an attractive nuisance, and one that can easily come back to haunt us (it has done so in the past). Hand-waving away the objects around existing methods isn't enough, and doesn't address the fact that it makes it much harder to add methods to dicts later. This is not something you can take a stab at and fix later.

The fact that similar solutions exist outside of the standard library does not mean it should have been in the standard library. Is there something we can do better by having it be part of the standard library? Do the different existing implementations have fundamental problems we can solve? Efficiency we can provide? Pitfalls we can avoid? Show the actual benefit here.

Adding something to the standard library doesn't just mean "we think this may be a useful tool". It is an endorsement of the tool and the technique. It's seen, not unreasonably so, as a signal that the tool is the right thing to use, and doing the thing it does is the right thing to do. Adding AttrDict as they are hamstrings dicts and endorses a notion that I believe is fundamentally problematic, equating dict keys with attribute access. This shouldn't be added to the standard library without some consensus that it's the right thing to do.

Adding a feature that has nothing to do with JSON -- just because it's convenient to use with JSON -- to the json module, while similar features exist elsewhere in the standard library, without even considering consolidating is an embarrassment that flies in the face of all the efforts to improve the maintenance situation. Even if it was put in json because a Core Dev suggested it.

Adding a feature to the standard library (or the language) without review at all is another matter. It's easy to add something, it's much harder to maintain it, maintain the things around it, and justify it for many years to come. The Core Devs are collectively maintaining Python and the standard library, and are collectively responsible. We aren't each individually responsible for each part, but without substantial support from Core Devs it should not be foisted on all. It's clear that there is at least considerable opposition here.

Please, Raymond, if you think this is worthy of inclusion in the standard library, make your case on discuss.python.org and show that there's support for it.

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jun 26, 2023
…ct_hook (pythonGH-96146)" (pythonGH-105948)

This reverts commit 1f0eafa.
(cherry picked from commit d3af83b)

Co-authored-by: Łukasz Langa <lukasz@langa.pl>
Yhg1s pushed a commit that referenced this issue Jun 26, 2023
…ect_hook (GH-96146)" (GH-105948) (#106117)

Revert "GH-96145: Add AttrDict to JSON module for use with object_hook (GH-96146)" (GH-105948)

This reverts commit 1f0eafa.
(cherry picked from commit d3af83b)

Co-authored-by: Łukasz Langa <lukasz@langa.pl>
@rhettinger rhettinger closed this as not planned Won't fix, can't repro, duplicate, stale Jun 27, 2023
@serhiy-storchaka
Copy link
Member

Opened #108191 for adding a support of a positional argument in SimpleNamespace().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-blocker type-feature A feature request or enhancement
Projects
Status: Done
Development

No branches or pull requests