Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start support for mypy --strict #1515

Merged
merged 28 commits into from
Dec 22, 2021

Conversation

ajnelson-nist
Copy link
Contributor

This patch series attempts to begin a "80/20" strategy suggested on other Issue threads (cited in References) around finding "most commonly used APIs" to address issues introduced with the addition of the py.typed file.

The first patches are the minimal necessary to get mypy --strict to pass on the test program test/test_issue1447.py, which exercises (IIRC, all of) the rdflib functions I've used across several projects. Overall, the patch series adds annotations to methods under rdflib/, annotations to a few existing tests under test/, a new test under test/, and a practice of running the new and just-revised tests to mypy --strict in the various testing infrastructures. Feedback is welcome on some of the type-supporting decisions (especially what I call out as worth squinting eyes) and the sustainability of the testing practice.

I have a few outstanding questions that would be good to answer before merging this PR, and/or give their own Issues if you'd rather merge sooner:

  1. I guessed by name pattern on part of integrating the new test. I assume Issue 1447 is the right reference, though we can start different Issues if it'd be better to focus more tests on specific things. On the other hand, the 1447 test script could also be a "Feature request" zone where others could contribute (even trivial) function calls that meet their own needs for strict type review.
  2. How, and especially where, should mypy --strict be called? Would docker-compose.tests.yml get another command: line to run mypy --strict on just that file? Issue 1513 came up as I was testing things, and my initial two guesses of which mypy call to revise weren't what ended up being revised, whereas my draft of this PR noted the docker script that has its own mypy step, and I missed that Github Actions seems to be what runs the CI now. I'm proceeding in this PR with just modifying Github Actions.
  3. Inlined in the 1447 test program - is there a better rdflib class name or type designation for Nodes that are not Literals, analogous to SHACL's sh:BlankNodeOrIRI? I frequently find the need to define a Union type like I did in the test program.
  4. Inlined in the test program - it's not clear to me why prepareQuery is not exposed to mypy via rdflib/plugins/sparql/__init__py, while the test program itself would run just fine calling rdflib.plugins.sparql.prepareQuery(). Any ideas?
  5. Squinting review request - The hardest part of this patch series to draft, and now probably the one worth the sternest review, was getting Literal.toPython() to work. The first patch makes that the only .toPython method that is typed returning Any, while the rest return str. It's possible to make it return a Union of types from the various mapping dictionaries, but that causes extra work (either assert or typing.cast) for any caller trying to pass mypy --strict. As written in the first patch, the typing and value-testing requires minimal amount of annotation work. Does this practice pass your review, though?

Test status:

pytest, mypy, and black continue to pass as well as they did, as parameterized in .github/workflows/validate.yml and Makefile, before the addition of this patch. For review notes:

  • black currently reports 4 formatting errors in files not touched by this patch series, which I see is || true'd out in the Github Actions CI.
  • test/test_xmlliterals.py::testHTML currently fails in the master branch on my laptop, though, I'm not sure why it's passing on the Github Actions CI.

I'll do catch-up merges if someone lets me know that either is fixed, even a rebase if you prefer seeing PRs with all patches bearing a green checkmark from CI.

References:

Disclaimer:

Participation by NIST in the creation of the documentation of mentioned software is not intended to imply a recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that any specific software is necessarily the best available for the purpose.

This patch is believed to be part of what would be necessary to resolve
Issue 1447.

This patch addresses type errors raised by `mypy` pertaining to use of
untyped methods within typed contexts.  These signatures are sufficient
to pass a test application (coming in a following patch) that exercises:

* creation of a graph;
* use of `Graph.add()` to supply triples containing `URIRef`, `BNode`,
  and `Literal` objects;
* use of `Graph.triples()`;
* use of `Graph.query()` with a string argument;
* use of `Graph.query()` with a `Query` argument;
* use of `Graph.query()` to return query results with different
  bound-variable lengths (i.e. different tuple lengths); and
* use of several `.toPython()` methods, including `Literal.toPython()`.

To address issues with the various `Node` subclasses appearing to be
untyped on initialization, this patch introduces `__init__` methods to
satisfy `mypy` needing a type signature.  To support any future
initialization logic, `super()` is used to support the Method Resolution
Order chain.  The alternative approach of trying to assign return types
to `__new__` didn't seem to work for two reasons that would require more
significant design revision:

* `BNode.__new__()` returns an `Identifier`, and `mypy` complained that
  this is an incompatible return type.  Attempting `typing.cast()`
  didn't seem to resolve this problem.
* `Literal.__new__()` was raising this `mypy --strict` error:
  ```
  "Literal" has no attribute "_value"; maybe "value"?
  ```
  I can't think of a practice aside from exposing `_value` as a slot
  that would convince `mypy` that the field exists.

Hence, the classes under `rdflib/term.py` got `__init__()` methods.

References:
* RDFLib#1447
* https://stackoverflow.com/a/33533514 describes PEPs 484 and 563
  relative to this patch.

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
…ypes

This test is anchored off of Issue 1447, trying to identify "common"
functions that would be used in a script exercising multiple SPARQL
queries that return Python values.

References:
* RDFLib#1447
* python/typing#911

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
…nsumer

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@aucampia
Copy link
Member

Thanks for this, I will have a look at it tomorrow night.

@ajnelson-nist
Copy link
Contributor Author

ajnelson-nist commented Dec 17, 2021

Well, that's an unfortunate series of X's.

The call to mypy that I added was only supposed to review four of the scripts under test/. On my local machine, those are the only four reviewed. In the Github Action, everything's reviewed.

I determined the difference is in how the package install is run:

  1. python setup.py install triggers the fail mode behavior.
  2. pip install . triggers the fail mode behavior.
  3. pip install --editable . does not trigger the fail mode behavior.
  4. pip install --editable ${some_git_submodule_dir}/rdflib does not trigger the fail mode behavior.
  5. pip install ${some_git_submodule_dir}/rdflib does not trigger the fail mode behavior.

Of course, all of my testing directly with this repository was with the last three modes. (Modes 4 and 5 were how I tested these modifications in other rdflib-using repositories I'd like to eventually test with mypy --strict.)

So, I guess I need to highlight my original question 2 - how and where would it be appropriate---and heck, even functional---for mypy --strict to be called to review test applications?

@ajnelson-nist
Copy link
Contributor Author

Evidence of the pip install -e flag getting this patch series to work is in this CI-debugging PR.

@ajnelson-nist
Copy link
Contributor Author

Also, I just caught that PR 1498 touches on some of the same content within Literal. Thanks to @veyndan for the _Any member definition, it didn't occur to me to try that.

@aucampia , I'd be fine revising this patch series if 1498 is merged first.

@veyndan
Copy link
Contributor

veyndan commented Dec 17, 2021

Thanks to @veyndan for the _Any member definition, it didn't occur to me to try that.

It was actually @aucampia who added that bit (a65e710) 👏

@aucampia
Copy link
Member

I guessed by name pattern on part of integrating the new test. I assume Issue 1447 is the right reference, though we can start different Issues if it'd be better to focus more tests on specific things. On the other hand, the 1447 test script could also be a "Feature request" zone where others could contribute (even trivial) function calls that meet their own needs for strict type review.

I'm not entirely sure if this pattern is considered ideal - I know it has been used but there has also been some resistance to it, and I would say I somewhat agree that it may be better to try and decouple tests from the issue they were in. I think a name like test_strict_typing.py may be more appropriate.

I made a pull request to update the developers guide on where tests should go (#1517) and will run it past others to see what the consensus is on this. I think for now it is fine, but the file name may need to change later.

How, and especially where, should mypy --strict be called? Would docker-compose.tests.yml get another command: line to run mypy --strict on just that file? Issue 1513 came up as I was testing things, and my initial two guesses of which mypy call to revise weren't what ended up being revised, whereas my draft of this PR noted the docker script that has its own mypy step, and I missed that Github Actions seems to be what runs the CI now. I'm proceeding in this PR with just modifying Github Actions.

I think ideally this should be setup in setup.cfg using module specific mypy config: https://mypy.readthedocs.io/en/stable/config_file.html#per-module-and-global-options

So that will entail something like this in setup.cfg:

[mypy]
python_version = 3.7
warn_unused_configs = True
ignore_missing_imports = True
disallow_subclassing_any = False
warn_unreachable = True

[mypy-test.test_issue1447]
strict = True

I have tried it thought and it is not quite doing what I'm expecting, there are some other fallbacks but I will try a couple of things to get it working.

Will respond to other points when I can, hopefully tomorrow.

@aucampia
Copy link
Member

How, and especially where, should mypy --strict be called? Would docker-compose.tests.yml get another command: line to run mypy --strict on just that file? Issue 1513 came up as I was testing things, and my initial two guesses of which mypy call to revise weren't what ended up being revised, whereas my draft of this PR noted the docker script that has its own mypy step, and I missed that Github Actions seems to be what runs the CI now. I'm proceeding in this PR with just modifying Github Actions.

@ajnelson-nist I made a PR against your branch to make the validation pass with the selective files running in strict mode: ajnelson-nist#8

Would recommend a rebase onto master though as there has been a couple of changes related to mypy and typing since your branch.

- Changed all mypy invocation to not include paths and moved the paths
  to the mypy section in setup.cfg.
  This removes redudancy in directory specification and reduces the
  potential that directories are changed in one invocation but not
  others.
- Ignore mypy errors in test directory.
- Enable strict mypy config with in-file config comments in the
  following files:
  - test/test_issue1447.py
  - test/test_issue_git_336.py
  - test/test_literal.py
  - test/test_parse_file_guess_format.py
@aucampia
Copy link
Member

Please merge ajnelson-nist#9 so that you are adding typing to __new__ instead of adding __init__

rdflib/term.py Outdated Show resolved Hide resolved
test/test_issue1447.py Outdated Show resolved Hide resolved
@aucampia
Copy link
Member

4. Inlined in the test program - it's not clear to me why prepareQuery is not exposed to mypy via rdflib/plugins/sparql/__init__py, while the test program itself would run just fine calling rdflib.plugins.sparql.prepareQuery(). Any ideas?

mypy has an option:

  --no-implicit-reexport    Treat imports as private unless aliased (inverse:
                            --implicit-reexport)

This is enabled in --strict mode:

  --strict                  Strict mode; enables the following flags: --warn-
                            unused-configs, --disallow-any-generics,
                            --disallow-subclassing-any, --disallow-untyped-
                            calls, --disallow-untyped-defs, --disallow-
                            incomplete-defs, --check-untyped-defs, --disallow-
                            untyped-decorators, --no-implicit-optional,
                            --warn-redundant-casts, --warn-unused-ignores,
                            --warn-return-any, --no-implicit-reexport,
                            --strict-equality

For more info see https://mypy.readthedocs.io/en/stable/config_file.html#confval-implicit_reexport

@aucampia
Copy link
Member

  • Inlined in the 1447 test program - is there a better rdflib class name or type designation for Nodes that are not Literals, analogous to SHACL's sh:BlankNodeOrIRI? I frequently find the need to define a Union type like I did in the test program.

Maybe:

Subject = typing.Union[rdflib.BNode, rdflib.URIRef]

After all, from https://www.w3.org/TR/rdf11-concepts/#section-triples

An RDF triple consists of three components:

  • the subject, which is an IRI or a blank node
  • the predicate, which is an IRI
  • the object, which is an IRI, a literal or a blank node

@aucampia
Copy link
Member

5. Squinting review request - The hardest part of this patch series to draft, and now probably the one worth the sternest review, was getting Literal.toPython() to work. The first patch makes that the only .toPython method that is typed returning Any, while the rest return str. It's possible to make it return a Union of types from the various mapping dictionaries, but that causes extra work (either assert or typing.cast) for any caller trying to pass mypy --strict. As written in the first patch, the typing and value-testing requires minimal amount of annotation work. Does this practice pass your review, though?

For now typing Literal.toPython() as returning Any is the only option, and maybe the right option also as typing it as more specific types seem to fail. It is anyway not wrong, just over broad at worst, but at best it is maybe correct.

We need to add a lot more typing to help figure out the tightest constraints that make sense, but your changes are a step in the right direction and best to keep them minimal and merge them. I have a bunch more typing changes that are not in PRs yet, and there are a lot more to add, but it is easier if we make them in small batches - it will take some time getting to the point where everything works in strict mode.

@aucampia
Copy link
Member

If you give me access to your branch then I can do some rebases and merge the PRs I made, up to you of course - don't mind either way - but I want to keep this mergeable so we can get it in as soon as we are ready and so it does not hold up other typing changes that will be made.

@aucampia
Copy link
Member

I think in the long run we should just get mypy passing on test directory without strict, and then generally start adding additional errrors/warnings - not something to add here, but just a general view on what to do to get things moving along.

@aucampia aucampia mentioned this pull request Dec 21, 2021
ajnelson-nist and others added 8 commits December 21, 2021 11:01
Fix mypy settings to run select test files in strict mode
Type __new__ instead of adding __init__
This follows through with Iwan's suggestion to type `__new__` instead of
adding `__init__`.  `Node.__init__` becomes a vestigial addition with
his patch.

Reported-by: Iwan Aucamp <aucampia@gmail.com>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Reported-by: Iwan Aucamp <aucampia@gmail.com>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Reported-by: Iwan Aucamp <aucampia@gmail.com>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
test/test_issue1447.py Outdated Show resolved Hide resolved
@aucampia
Copy link
Member

aucampia commented Dec 21, 2021

I'm happy with this change, something which is maybe debatable is on which test files strict mode makes sense. I'm fine with it on the ones you enabled it for but would just want the other maintainers to give some feedback there. (CC: @ashleysommer @nicholascar @white-gecko ).

As I said in a comment before, I want to make mypy run on all test in normal non strict mode, basically all that will be needed for this is removal of ignore_errors = True (this) after fixing the issues raised - but it would be good to have it run on strict mode on at least some tests specifically designed to verifying typing, and for that I think test/test_typing.py is ideal. I will look at that after this PR is merged as I think the setup here provides a good basis for it.

Reported-by: Iwan Aucamp <aucampia@gmail.com>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
This is just to fit with practice used in other places in this
repository.  This patch has no impact on function.

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@ajnelson-nist
Copy link
Contributor Author

I've adjusted a few more style matters, but otherwise think this patch series is ready for review and merge.

@aucampia , I agree with the general direction of eventually removing ignore_errors.

@ajnelson-nist
Copy link
Contributor Author

Also, it seems I'm getting some CI chaos from networking-stack tests. Hopefully that doesn't confuse review.

Copy link
Member

@nicholascar nicholascar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely steps in the right direction! Yes the literal typing is still not great. Last year I added a whole bunch more tests for literal types but not mypy or type hinting. It gets really hard to cater for all XSD datatypes AND Python datatypes AND custom datatypes since there's no guarentee that they all line up.

@nicholascar nicholascar merged commit 535eeae into RDFLib:master Dec 22, 2021
@ajnelson-nist ajnelson-nist deleted the start_updates_for_mypy_strict branch August 4, 2022 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants