Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate dataset errors with module __getattr__ #2673

Merged
merged 108 commits into from
Jun 21, 2023
Merged

Conversation

deepyaman
Copy link
Member

@deepyaman deepyaman commented Jun 12, 2023

NOTE: Kedro datasets are moving from kedro.extras.datasets to a separate kedro-datasets package in
kedro-plugins repository. Any changes to the dataset implementations
should be done by opening a pull request in that repository.

Description

Deprecating using DeprecatedClassMeta only deprecates the name when a deprecated object is instantiated. In the case of errors, they are often used without instantiation (e.g. in except blocks), so DeprecatedClassMeta doesn't work.

Development notes

Use __getattr__ approach for deprecation. This is a relatively-newer method (added in 3.7, see https://peps.python.org/pep-0562/).

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes

deepyaman and others added 30 commits April 10, 2023 10:30
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
deepyaman added a commit to kedro-org/kedro-viz that referenced this pull request Jun 20, 2023
Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @deepyaman! Left a few suggestions for improvement.

tests/io/test_core.py Outdated Show resolved Hide resolved
kedro/io/core.py Outdated Show resolved Hide resolved
kedro/io/__init__.py Show resolved Hide resolved
Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Can we do a quick manual test it works for it will works for import and importlib?

Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor comments on styling, I have done manual testing and it's working, but I have some questions.

image

Questions:
We will get different a error when we do from xxx import versus importlib.import_module, is this fine? I think it should be fine, as this behavior is consistent when I test a 3-rd party library like pandas, we didn't change anything, but just want to check to make sure.

ref: the code I used for manual test

# Expected

from kedro.io import DataSetNotFoundError

# Expected
import importlib
io = importlib.import_module("kedro.io")
io.DataSetNotFoundError

# ImportError
from kedro.io import DUMMY

# AttributeError
io.DUMMY

kedro/io/__init__.py Show resolved Hide resolved
@astrojuanlu
Copy link
Member

astrojuanlu commented Jun 20, 2023

To your comment, I get the exact same error using import or importlib:

In [1]: from kedro.io import DataSetNotFoundError
/Users/juan_cano/Projects/QuantumBlack Labs/kedro/kedro/io/__init__.py:37: DeprecationWarning: DataSetNotFoundError has been renamed to DatasetNotFoundError, and the alias will be removed in Kedro 0.19.0
  return getattr(kedro.io.core, name)
In [1]: import importlib

In [2]: kedro_io = importlib.import_module("kedro.io")

In [3]: kedro_io.DataSetNotFoundError
/Users/juan_cano/Projects/QuantumBlack Labs/kedro/kedro/io/__init__.py:37: DeprecationWarning: DataSetNotFoundError has been renamed to DatasetNotFoundError, and the alias will be removed in Kedro 0.19.0
  return getattr(kedro.io.core, name)
Out[3]: kedro.io.core.DatasetNotFoundError

In [4]: kedro_io.DatasetNotFoundError
Out[4]: kedro.io.core.DatasetNotFoundError

In [5]: getattr(kedro_io, "DatasetNotFoundError")
Out[5]: kedro.io.core.DatasetNotFoundError

In [6]: getattr(kedro_io, "DataSetNotFoundError")
/Users/juan_cano/Projects/QuantumBlack Labs/kedro/kedro/io/__init__.py:37: DeprecationWarning: DataSetNotFoundError has been renamed to DatasetNotFoundError, and the alias will be removed in Kedro 0.19.0
  return getattr(kedro.io.core, name)
Out[6]: kedro.io.core.DatasetNotFoundError

@noklam
Copy link
Contributor

noklam commented Jun 20, 2023

@astrojuanlu Yes I have done the same and verify it works, they are only different when you imported a invalid module/class, but the behavior is consistent with any library so I think it's correct.

Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@astrojuanlu astrojuanlu force-pushed the refactor/rename-data-set branch 2 times, most recently from 718faec to f316afe Compare June 21, 2023 07:35
@astrojuanlu
Copy link
Member

I don't understand why DCO was happy until now, but then when I updated the branch it started complaining about all the previous commits, and even after I removed my merge commit, it keeps complaining. @deepyaman do you want to do the honors and signoff everything?

@astrojuanlu

This comment was marked as outdated.

@deepyaman deepyaman enabled auto-merge (squash) June 21, 2023 12:09
@deepyaman deepyaman merged commit 160fd6b into main Jun 21, 2023
3 checks passed
@deepyaman deepyaman deleted the refactor/rename-data-set branch June 21, 2023 14:00
deepyaman added a commit to kedro-org/kedro-viz that referenced this pull request Jun 21, 2023
* ci: incorporate changes from kedro-org/kedro#2673

* Use "Dataset" version for "uk.data_science" in test

* Replace checks for string "MemoryDataSet" in tests

* Change "MemoryDataSet" to "MemroyDataset" in  example_pipelines.json

* Revert "ci: incorporate changes from kedro-org/kedro#2673"

This reverts commit 6245cd8.

* Update test_requirements.txt

* Trigger CI

* Trigger CI

* Temporarily avoid Typer requirement

* Use `MemoryDataset` in tests to avoid mypy error
@noklam noklam mentioned this pull request Jun 28, 2023
6 tasks
jmalovera10 pushed a commit to jmalovera10/kedro that referenced this pull request Jun 30, 2023
* Replace and deprecate `DataSet` use in class names

* Replace another format string with an f-string

* Perform deprecations for cached, lambda, and partitioned datasets

* Deprecated `MemoryDataSet` in favor of `MemoryDataset`

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

* Fix keyword argument to specify metaclass on `CachedDataSet`

* Fix reference to `PartitionedDataset`

* Keep `AbstractDataSet` subscriptable

* Update __init__.py files, __all__ definitions, etc

* Warn of impending Kedro 0.19 (not abstract future)

* Update `VideoDataSet` to `VideoDataset` (and refs)

* Add missing `kedro.utils.DeprecatedClassMeta` imps

* Change deprecated references to `AbstractDataSet`

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

* Warn of impending Kedro 0.19 (not abstract future)

* Rename `pandas.CSVDataSet` to `pandas.CSVDataset`

* Fix some pylint errors and blacken code

* Update `dask.ParquetDataSet`

* Undo changes for `VideoDataSet`, inherit from new base

* Undo changes to `APIDataSet`, inherit from new base

* Fix some imports and missed references

* Undo changes to `BioSequenceDataSet`, inherit from new base

* Undo changes to Dask and Pandas datasets, inherit from new bases

* Remove the `AbstractDataset` and `AbstractVersionedDataset` alias, update `kedro.io.core`

* Undo changes in `kedro/extras/datasets`

* Update branch

* Change `DataSetError` to `DatasetError`

* Remove deprecated aliases for Abstract*DataSet

* Change `DataSetError` to `DatasetError` in tests/

* Change DataSet*Error to Dataset*Error in tests/

* Fix references to DataSet in a lot of tests

* Change `CSVDataset` back to `CSBVDataSet`

* Rename core datasets used across `tests` directory

* Fix "Saving 'None' to a 'DataSet' is not allowed." messages

* Fix `test_http_filesystem_no_versioning` everywhere

* Fix removal of "data"

* Deprecate `_SharedMemoryDataSet` in favor of `_SharedMemoryDataset`

* Fix tests/pipeline/test_pipeline_from_missing.py

* Fix list datasets test

* Change patched IncrementalDataSet to IncrementalDataset

* Fix default checkpoint dataset

* Fix data catalog tests

* Fix error message

* Use `MemoryDataset`, not `MemoryDataSet`, by default

* Use `MemoryDataset`, not `MemoryDataSet`, for missing datasets in data catalog

* Rename DefaultDataSet key to DefaultDataset

* Change `LambdaDataSet` to `LambdaDataset` in `test_node_run.py`

* Update error message--but should I?

* Update error message--but should I?

* Update error message in kedro/io/core.py--but should I?

* Update RELEASE.md

* Fix remaining tests

* Fix lint issues

* Align capitalization

* Add `DeprecatedClassMeta` tests from StackOverflow

* Blacken kedro/utils.py

* Ignore "No value for argument 'subclass' in unbound method call"

* Rename 'foo' to 'value' to satisfy linter

* Disable pylint messages on deprecated class definitions

* Blacken kedro/utils.py

* Wrap `kedro.io.core` to fix error deprecation

* Simplify deprecation of error names to try to fix docs

* Undo attempt to make docs pass

Revert "Simplify deprecation of error names to try to fix docs"

This reverts commit e9294be.

Revert "Wrap `kedro.io.core` to fix error deprecation"

This reverts commit db9b7ac.

* Wrap `kedro.io.core` to fix error deprecation

* Leverage PEP 562 for less hacky error deprecations

* Test old `DataSetError` in `kedro.extras.datasets`

* Fix missing and unnecessary imports/other mistakes

* Blacken kedro/io/core.py

* Don't raise `DeprecationWarning` each time import from  `kedro.io`

* Replace `DataSetError` with `DatasetError` in test

* Add missing "in" to a `DeprecationWarning` message

* Add "Dataset" versions of errors to `kedro.io` doc

* Add updated "Dataset" names to `kedro.io.rst` and sort the entries

* Add `_SharedMemoryDataset` to type targets in conf

* Revert all changes to `DatasetError` in `kedro/extras/datasets`

The idea is so that we're not making changes not in Kedro-Datasets,
and we can be confident that Kedro-Datasets won't break.

* docs(datasets): fix a ref to `GeoJSONLocalDataset`

* Consistently use `DatasetError` in dataset tests

* Remove unused import

* Remove "DataSet" imports from kedro/io/__init__.py

* chore(sdk): use type hints to skirt linting errors

* Implement workaround in `kedro/io/__init__.py` too

* Import future annotation in `kedro/io/__init__.py`

* test(datasets): cover raising a DeprecationWarning

* Ignore pylint warnings (working as intended)

* Replace `globals()` in `kedro.io.core.__getattr__`

* Remove use of `exec` (use `import_module` instead)

* Rename the parameters passed to `test_deprecation`

---------

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Juan Lovera <8958924+jmalovera10@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants