Skip to content

Commit

Permalink
Merge branch 'master' into develop
Browse files Browse the repository at this point in the history
  • Loading branch information
QuantumBlack Labs committed Oct 7, 2021
2 parents 84f8891 + d1d4a52 commit 257fc35
Show file tree
Hide file tree
Showing 5 changed files with 102 additions and 8 deletions.
12 changes: 6 additions & 6 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -89,23 +89,23 @@ repos:
# https://github.com/PyCQA/pylint/issues/618
# The first set of pylint checks if for local pre-commit, it only runs on the files changed.
- id: pylint-quick-kedro
name: "Quick PyLint on kedro/*"
name: "Quick Pylint on kedro/*"
language: system
types: [file, python]
files: ^kedro/
exclude: ^kedro/templates/
entry: pylint -j 4 --disable=unnecessary-pass
stages: [commit]
- id: pylint-quick-features
name: "Quick PyLint on features/*"
name: "Quick Pylint on features/*"
language: system
types: [file, python]
files: ^features/
exclude: ^features/steps/test_starter
entry: pylint -j 4 --disable=missing-docstring,no-name-in-module
stages: [commit]
- id: pylint-quick-tests
name: "Quick PyLint on tests/*"
name: "Quick Pylint on tests/*"
language: system
types: [file, python]
files: ^tests/
Expand All @@ -114,20 +114,20 @@ repos:

# The same pylint checks, but running on all files. It's for manual run with `make lint`
- id: pylint-kedro
name: "PyLint on kedro/*"
name: "Pylint on kedro/*"
language: system
pass_filenames: false
stages: [manual]
entry: pylint -j 4 --disable=unnecessary-pass --init-hook="import sys; sys.setrecursionlimit(2000)" kedro
- id: pylint-features
name: "PyLint on features/*"
name: "Pylint on features/*"
language: system
pass_filenames: false
stages: [manual]
exclude: ^features/steps/test_starter
entry: pylint -j 4 --disable=missing-docstring,no-name-in-module features
- id: pylint-tests
name: "PyLint on tests/*"
name: "Pylint on tests/*"
language: system
pass_filenames: false
stages: [manual]
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Our [Get Started guide](https://kedro.readthedocs.io/en/stable/02_get_started/01

## What are the main features of Kedro?

![Kedro-Viz Pipeline Visualisation](https://raw.githubusercontent.com/quantumblacklabs/kedro/develop/static/img/pipeline_visualisation.png)
![Kedro-Viz Pipeline Visualisation](https://github.com/quantumblacklabs/kedro-viz/blob/main/.github/img/banner.png)
*A pipeline visualisation generated using [Kedro-Viz](https://github.com/quantumblacklabs/kedro-viz)*


Expand Down
3 changes: 2 additions & 1 deletion RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,8 @@
## Upcoming deprecations for Kedro 0.18.0

## Thanks for supporting contributions
[Deepyaman Datta](https://github.com/deepyaman)
[Deepyaman Datta](https://github.com/deepyaman),
[Manish Swami](https://github.com/ManishS6)

# Release 0.17.5

Expand Down
93 changes: 93 additions & 0 deletions docs/source/07_extend_kedro/02_hooks.md
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,99 @@ Under the hood, we use [pytest's pluggy](https://pluggy.readthedocs.io/en/latest

## Hooks examples

### Add memory consumption tracking

This example illustrates how to track memory consumption using `memory_profiler`.

* Install dependencies:

```console
pip install memory_profiler
```

* Implement `before_dataset_loaded` and `after_dataset_loaded`

<details>
<summary><b>Click to expand</b></summary>

```python
...
from memory_profiler import memory_usage
import logging


def _normalise_mem_usage(mem_usage):
# memory_profiler < 0.56.0 returns list instead of float
return mem_usage[0] if isinstance(mem_usage, (list, tuple)) else mem_usage


class MemoryProfilingHooks:
def __init__(self):
self._mem_usage = {}

@property
def _logger(self):
return logging.getLogger(self.__class__.__name__)

@hook_impl
def before_dataset_loaded(self, dataset_name: str) -> None:
before_mem_usage = memory_usage(
-1,
interval=0.1,
max_usage=True,
retval=True,
include_children=True,
)
before_mem_usage = _normalise_mem_usage(before_mem_usage)
self._mem_usage[dataset_name] = before_mem_usage
)

@hook_impl
def after_dataset_loaded(self, dataset_name: str) -> None:
after_mem_usage = memory_usage(
-1,
interval=0.1,
max_usage=True,
retval=True,
include_children=True,
)
# memory_profiler < 0.56.0 returns list instead of float
after_mem_usage = _normalise_mem_usage(after_mem_usage)

self._logger.info(
"Loading %s consumed %2.2fMiB memory",
dataset_name,
after_mem_usage - self._mem_usage[dataset_name],
)
```
</details>

* Register Hooks implementation by updating the `HOOKS` variable in `settings.py` as follows:

```python
HOOKS = (MemoryProfilingHooks(),)
```

Then re-run the pipeline:

```console
$ kedro run
```

The output should look similar to the following:

```
...
2021-10-05 12:02:34,946 - kedro.io.data_catalog - INFO - Loading data from `shuttles` (ExcelDataSet)...
2021-10-05 12:02:43,358 - MemoryProfilingHooks - INFO - Loading shuttles consumed 82.67MiB memory
2021-10-05 12:02:43,358 - kedro.pipeline.node - INFO - Running node: preprocess_shuttles_node: preprocess_shuttles([shuttles]) -> [preprocessed_shuttles]
2021-10-05 12:02:43,440 - kedro.io.data_catalog - INFO - Saving data to `preprocessed_shuttles` (MemoryDataSet)...
2021-10-05 12:02:43,446 - kedro.runner.sequential_runner - INFO - Completed 1 out of 2 tasks
2021-10-05 12:02:43,559 - kedro.io.data_catalog - INFO - Loading data from `companies` (CSVDataSet)...
2021-10-05 12:02:43,727 - MemoryProfilingHooks - INFO - Loading companies consumed 4.16MiB memory
...
```

### Add data validation

This example adds data validation to node inputs and outputs using [Great Expectations](https://docs.greatexpectations.io/en/latest/).
Expand Down
Binary file removed static/img/pipeline_visualisation.png
Binary file not shown.

0 comments on commit 257fc35

Please sign in to comment.