diff --git a/docs/source/contribution/developer_contributor_guidelines.md b/docs/source/contribution/developer_contributor_guidelines.md index 56f25e4965..f7e18d6067 100644 --- a/docs/source/contribution/developer_contributor_guidelines.md +++ b/docs/source/contribution/developer_contributor_guidelines.md @@ -182,7 +182,7 @@ git commit -s -m "This is my commit message" To avoid needing to remember the `-s` flag on every commit, you might like to set up a [git alias](https://git-scm.com/book/en/v2/Git-Basics-Git-Aliases) for `git commit -s`. Alternatively, run `make sign-off` to setup a [`commit-msg` Git hook](https://git-scm.com/docs/githooks#_commit_msg) that automatically signs off all commits (including merge commits) you make while working on the Kedro repository. -If your PR is blocked due to unsigned commits then you will need to follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass. +If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass. ## Need help? diff --git a/docs/source/data/data_catalog.md b/docs/source/data/data_catalog.md index ead2433dfa..cc739b6c53 100644 --- a/docs/source/data/data_catalog.md +++ b/docs/source/data/data_catalog.md @@ -8,11 +8,11 @@ All supported data connectors are available in [`kedro.extras.datasets`](/kedro. Kedro uses configuration to make your code reproducible when it has to reference datasets in different locations and/or in different environments. -You can copy this file and reference additional locations for the same datasets. For instance, you can use the `catalog.yml` file in `conf/base/` to register the locations of datasets that would run in production while copying and updating a second version of `catalog.yml` in `conf/local/` to register the locations of sample datasets that you are using for prototyping your data pipeline(s). +You can copy this file and reference additional locations for the same datasets. For instance, you can use the `catalog.yml` file in `conf/base/` to register the locations of datasets that would run in production, while copying and updating a second version of `catalog.yml` in `conf/local/` to register the locations of sample datasets that you are using for prototyping your data pipeline(s). There is built-in functionality for `conf/local/` to overwrite `conf/base/` [described in the documentation about configuration](../kedro_project_setup/configuration.md). This means that a dataset called `cars` could exist in the `catalog.yml` files in `conf/base/` and `conf/local/`. In code, in `src`, you would only call a dataset named `cars` and Kedro would detect which definition of `cars` dataset to use to run your pipeline - `cars` definition from `conf/local/catalog.yml` would take precedence in this case. -The Data Catalog also works with the `credentials.yml` in `conf/local/`, allowing you to specify usernames and passwords that are required to load certain datasets. +The Data Catalog also works with the `credentials.yml` in `conf/local/`, allowing you to specify usernames and passwords required to load certain datasets. The are two ways of defining a Data Catalog through the use of YAML configuration, or programmatically using an API. Both methods allow you to specify: @@ -247,7 +247,7 @@ scooters_query: index_col: [name] ``` -When using [`pandas.SQLTableDataSet`](/kedro.extras.datasets.pandas.SQLTableDataSet) or [`pandas.SQLQueryDataSet`](/kedro.extras.datasets.pandas.SQLQueryDataSet) you must provide a database connection string. In the example above we pass it using `scooters_credentials` key from the credentials (see the details in the [Feeding in credentials](#feeding-in-credentials) section below). `scooters_credentials` must have a top-level key `con` containing [SQLAlchemy compatible](https://docs.sqlalchemy.org/en/13/core/engines.html#database-urls) connection string. As an alternative to credentials, you could explicitly put `con` into `load_args` and `save_args` (`pandas.SQLTableDataSet` only). +When using [`pandas.SQLTableDataSet`](/kedro.extras.datasets.pandas.SQLTableDataSet) or [`pandas.SQLQueryDataSet`](/kedro.extras.datasets.pandas.SQLQueryDataSet) you must provide a database connection string. In the example above we pass it using `scooters_credentials` key from the credentials (see the details in the [Feeding in credentials](#feeding-in-credentials) section below). `scooters_credentials` must have a top-level key `con` containing a [SQLAlchemy compatible](https://docs.sqlalchemy.org/en/13/core/engines.html#database-urls) connection string. As an alternative to credentials, you could explicitly put `con` into `load_args` and `save_args` (`pandas.SQLTableDataSet` only). Example 13: Load data from an API endpoint, example US corn yield data from USDA @@ -337,9 +337,9 @@ The list of all available parameters is given in the [Paramiko documentation](ht ## Creating a Data Catalog YAML configuration file via CLI -You can use [`kedro catalog create` command to create a Data Catalog YAML configuration](../development/commands_reference.md#create-a-data-catalog-yaml-configuration-file). +You can use the [`kedro catalog create` command to create a Data Catalog YAML configuration](../development/commands_reference.md#create-a-data-catalog-yaml-configuration-file). -It creates a `//catalog/.yml` configuration file with `MemoryDataSet` datasets for each dataset in a registered pipeline if it is missing from the `DataCatalog`. +This creates a `//catalog/.yml` configuration file with `MemoryDataSet` datasets for each dataset in a registered pipeline if it is missing from the `DataCatalog`. ```yaml # //catalog/.yml @@ -356,7 +356,7 @@ You can [configure parameters](../kedro_project_setup/configuration.md#load-para ## Feeding in credentials -Before instantiating the `DataCatalog` Kedro will first attempt [to read the credentials from the project configuration](../kedro_project_setup/configuration.md#aws-credentials). The resulting dictionary is then passed into `DataCatalog.from_config()` as the `credentials` argument. +Before instantiating the `DataCatalog`, Kedro will first attempt to read [the credentials from the project configuration](../kedro_project_setup/configuration.md#aws-credentials). The resulting dictionary is then passed into `DataCatalog.from_config()` as the `credentials` argument. Let's assume that the project contains the file `conf/local/credentials.yml` with the following contents: @@ -386,7 +386,7 @@ CSVDataSet( ## Loading multiple datasets that have similar configuration -You may encounter situations where your datasets use the same file format, load and save arguments, and are stored in the same folder. [YAML has a built-in syntax](https://yaml.org/spec/1.2.1/#Syntax) for factorising parts of a YAML file, which means that you can decide what is generalisable across your datasets so that you do not have to spend time copying and pasting dataset configurations in `catalog.yml`. +Different datasets might use the same file format, load and save arguments, and be stored in the same folder. [YAML has a built-in syntax](https://yaml.org/spec/1.2.1/#Syntax) for factorising parts of a YAML file, which means that you can decide what is generalisable across your datasets so that you need not spend time copying and pasting dataset configurations in `catalog.yml`. You can see this in the following example: diff --git a/docs/source/data/kedro_io.md b/docs/source/data/kedro_io.md index 3411c9344f..9d77285020 100644 --- a/docs/source/data/kedro_io.md +++ b/docs/source/data/kedro_io.md @@ -31,9 +31,9 @@ If you have a dataset called `parts`, you can make direct calls to it like so: parts_df = parts.load() ``` -However, we recommend using a `DataCatalog` instead (for more details, see [the `DataCatalog` documentation](../data/data_catalog.md)) as it has been designed to make all datasets available to project members. +We recommend using a `DataCatalog` instead (for more details, see [the `DataCatalog` documentation](../data/data_catalog.md)) as it has been designed to make all datasets available to project members. -For contributors, if you would like to submit a new dataset, you will have to extend `AbstractDataSet`. For a complete guide, please read [the section on custom datasets](../extend_kedro/custom_datasets.md). +For contributors, if you would like to submit a new dataset, you must extend the `AbstractDataSet`. For a complete guide, please read [the section on custom datasets](../extend_kedro/custom_datasets.md). ## Versioning @@ -331,8 +331,8 @@ Requires you to only specify a class of the underlying dataset either as a strin Full notation allows you to specify a dictionary with the full underlying dataset definition _except_ the following arguments: * The argument that receives the partition path (`filepath` by default) - if specified, a `UserWarning` will be emitted stating that this value will be overridden by individual partition paths -* `credentials` key - specifying it will result in `DataSetError` being raised; dataset credentials should be passed into `credentials` argument of the `PartitionedDataSet` rather than underlying dataset definition - see [the section below on partitioned dataset credentials](#partitioned-dataset-credentials) for details -* `versioned` flag - specifying it will result in `DataSetError` being raised; versioning cannot be enabled for the underlying datasets +* `credentials` key - specifying it will result in a `DataSetError` being raised; dataset credentials should be passed into the `credentials` argument of the `PartitionedDataSet` rather than the underlying dataset definition - see the section below on [partitioned dataset credentials](#partitioned-dataset-credentials) for details +* `versioned` flag - specifying it will result in a `DataSetError` being raised; versioning cannot be enabled for the underlying datasets #### Partitioned dataset credentials @@ -476,7 +476,7 @@ When using lazy saving, the dataset will be written _after_ the `after_node_run` [IncrementalDataSet](/kedro.io.IncrementalDataSet) is a subclass of `PartitionedDataSet`, which stores the information about the last processed partition in the so-called `checkpoint`. `IncrementalDataSet` addresses the use case when partitions have to be processed incrementally, i.e. each subsequent pipeline run should only process the partitions which were not processed by the previous runs. -This checkpoint, by default, is persisted to the location of the data partitions. For example, for `IncrementalDataSet` instantiated with path `s3://my-bucket-name/path/to/folder` the checkpoint will be saved to `s3://my-bucket-name/path/to/folder/CHECKPOINT`, unless [the checkpoint configuration is explicitly overwritten](#checkpoint-configuration). +This checkpoint, by default, is persisted to the location of the data partitions. For example, for `IncrementalDataSet` instantiated with path `s3://my-bucket-name/path/to/folder`, the checkpoint will be saved to `s3://my-bucket-name/path/to/folder/CHECKPOINT`, unless [the checkpoint configuration is explicitly overwritten](#checkpoint-configuration). The checkpoint file is only created _after_ [the partitioned dataset is explicitly confirmed](#incremental-dataset-confirm). @@ -488,7 +488,7 @@ Loading `IncrementalDataSet` works similarly to [`PartitionedDataSet`](#partitio #### Incremental dataset save -`IncrementalDataSet` save operation is identical to the [save operation of the `PartitionedDataSet`](#partitioned-dataset-save). +The `IncrementalDataSet` save operation is identical to the [save operation of the `PartitionedDataSet`](#partitioned-dataset-save). #### Incremental dataset confirm diff --git a/docs/source/deployment/airflow_astronomer.md b/docs/source/deployment/airflow_astronomer.md index 10011a0468..de843ce4ce 100644 --- a/docs/source/deployment/airflow_astronomer.md +++ b/docs/source/deployment/airflow_astronomer.md @@ -10,9 +10,9 @@ The general strategy to deploy a Kedro pipeline on Apache Airflow is to run ever ## Prerequisites -To follow along with this tutorial, make sure you have the following: +To follow this tutorial, ensure you have the following: -* An Airflow cluster: you can follow [Astronomer's quickstart guide](https://docs.astronomer.io/astro/#get-started) to set one up. +* An Airflow cluster: you can follow [Astronomer's quickstart guide](https://docs.astronomer.io/astro/#get-started) to set one up. * The [Astro CLI installed](https://docs.astronomer.io/astro/install-cli) * `kedro>=0.17` installed diff --git a/docs/source/deployment/argo.md b/docs/source/deployment/argo.md index 3959c59707..39b1aab68e 100644 --- a/docs/source/deployment/argo.md +++ b/docs/source/deployment/argo.md @@ -14,10 +14,10 @@ Here are the main reasons to use Argo Workflows: ## Prerequisites -To use Argo Workflows, make sure you have the following prerequisites in place: +To use Argo Workflows, ensure you have the following prerequisites in place: - [Argo Workflows is installed](https://github.com/argoproj/argo/blob/master/README.md#quickstart) on your Kubernetes cluster -- [Argo CLI is installed](https://github.com/argoproj/argo/releases) on you machine +- [Argo CLI is installed](https://github.com/argoproj/argo/releases) on your machine - A `name` attribute is set for each [Kedro node](/kedro.pipeline.node) since it is used to build a DAG - [All node input/output DataSets must be configured in `catalog.yml`](../data/data_catalog.md#using-the-data-catalog-with-the-yaml-api) and refer to an external location (e.g. AWS S3); you cannot use the `MemoryDataSet` in your workflow @@ -169,9 +169,9 @@ spec: The Argo Workflows is defined as the dependencies between tasks using a directed-acyclic graph (DAG). ``` -For the purpose of this walk-through, we are going to use AWS S3 bucket for DataSets therefore `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment variables must be set to have an ability to communicate with S3. The `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` values should be stored in [Kubernetes Secrets](https://kubernetes.io/docs/concepts/configuration/secret/) ([an example Kubernetes Secrets spec is given below](#submit-argo-workflows-spec-to-kubernetes)). +For the purpose of this walk-through, we will use an AWS S3 bucket for DataSets; therefore `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment variables must be set to have an ability to communicate with S3. The `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` values should be stored in [Kubernetes Secrets](https://kubernetes.io/docs/concepts/configuration/secret/) (an example [Kubernetes Secrets spec is given below](#submit-argo-workflows-spec-to-kubernetes)). -The spec template is written with using [Jinja templating language](https://jinja.palletsprojects.com/en/2.11.x/) so you need to install the Jinja Python package: +The spec template is written with the [Jinja templating language](https://jinja.palletsprojects.com/en/2.11.x/), so you must install the Jinja Python package: ```console $ pip install Jinja2 diff --git a/docs/source/deployment/aws_batch.md b/docs/source/deployment/aws_batch.md index 9f67ff8303..78b426a163 100644 --- a/docs/source/deployment/aws_batch.md +++ b/docs/source/deployment/aws_batch.md @@ -9,7 +9,7 @@ The following sections are a guide on how to deploy a Kedro project to AWS Batch ## Prerequisites -To use AWS Batch, make sure you have the following prerequisites in place: +To use AWS Batch, ensure you have the following prerequisites in place: - An [AWS account set up](https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-account/). - A `name` attribute is set for each [Kedro node](/kedro.pipeline.node). Each node will run in its own Batch job, so having sensible node names will make it easier to `kedro run --node `. diff --git a/docs/source/deployment/aws_sagemaker.md b/docs/source/deployment/aws_sagemaker.md index 8d21ee1d33..f72e64e0ed 100644 --- a/docs/source/deployment/aws_sagemaker.md +++ b/docs/source/deployment/aws_sagemaker.md @@ -346,4 +346,4 @@ Now you know how to run serverless machine learning jobs using SageMaker right f ## Cleanup -To cleanup the resources, [delete the S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/delete-bucket.html) and, optionally, the [IAM role you've created earlier](#create-sagemaker-execution-role) (IAM resources are free). The job details of an already completed SageMaker training job cannot be deleted, but such jobs don't incur any costs. +To cleanup the resources, [delete the S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/delete-bucket.html) and, optionally, the [IAM role you've created earlier](#create-sagemaker-execution-role) (IAM resources are free). The job details of an already completed SageMaker training job cannot be deleted, but such jobs incur no costs. diff --git a/docs/source/deployment/aws_step_functions.md b/docs/source/deployment/aws_step_functions.md index bbbb6bb1d8..e9004f41db 100644 --- a/docs/source/deployment/aws_step_functions.md +++ b/docs/source/deployment/aws_step_functions.md @@ -12,11 +12,11 @@ The following discusses how to run the Kedro pipeline from the [spaceflights tut ## Strategy -The general strategy to deploy a Kedro pipeline on AWS Step Functions is to run every Kedro node as an [AWS Lambda](https://aws.amazon.com/lambda/) function. The whole pipeline is converted into an [AWS Step Functions State Machine](https://docs.aws.amazon.com/step-functions/latest/dg/tutorial-creating-lambda-state-machine.html) for orchestration purpose. This approach mirrors the principles of running Kedro in a [distributed environment](distributed). +The general strategy to deploy a Kedro pipeline on AWS Step Functions is to run every Kedro node as an [AWS Lambda](https://aws.amazon.com/lambda/) function. The whole pipeline is converted into an [AWS Step Functions State Machine](https://docs.aws.amazon.com/step-functions/latest/dg/tutorial-creating-lambda-state-machine.html) for orchestration. This approach mirrors the principles of running Kedro in a [distributed environment](distributed). ## Prerequisites -To use AWS Step Functions, make sure you have the following: +To use AWS Step Functions, ensure you have the following: - An [AWS account set up](https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-account/) - [Configured AWS credentials](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html) on your local machine @@ -108,10 +108,10 @@ y_test: ### Step 2. Package the Kedro pipeline as an AWS Lambda-compliant Docker image -In December 2020, [AWS announced that an AWS Lambda function can now use a container image up to **10 GB in size**](https://aws.amazon.com/blogs/aws/new-for-aws-lambda-container-image-support/) as its deployment package, besides the original zip method. As it has a few [requirements for the container image to work properly](https://docs.aws.amazon.com/lambda/latest/dg/images-create.html#images-reqs), you will need to build your own custom Docker container image to both contain the Kedro pipeline and to comply with Lambda's requirements. +In December 2020, [AWS announced that an AWS Lambda function can now use a container image up to **10 GB in size**](https://aws.amazon.com/blogs/aws/new-for-aws-lambda-container-image-support/) as its deployment package, besides the original zip method. As it has a few [requirements for the container image to work properly](https://docs.aws.amazon.com/lambda/latest/dg/images-create.html#images-reqs), you must build your own custom Docker container image, both to contain the Kedro pipeline and to comply with Lambda's requirements. ```{note} -All of the following steps should be done in the Kedro project's root directory. +All the following steps should be done in the Kedro project's root directory. ``` * **Step 2.1**: Package the Kedro pipeline as a Python package so you can install it into the container later on: diff --git a/docs/source/deployment/databricks.md b/docs/source/deployment/databricks.md index 0d5145326c..5b1969c413 100644 --- a/docs/source/deployment/databricks.md +++ b/docs/source/deployment/databricks.md @@ -80,7 +80,7 @@ As a result you should have: ### 4. Create GitHub personal access token -To synchronise the project between the local development environment and Databricks we will use a private GitHub repository that you will create in the next step. For authentication we will need a GitHub personal access token, so go ahead and [create such token in your GitHub developer settings](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token). +To synchronise the project between the local development environment and Databricks, we will use a private GitHub repository, which you will create in the next step. For authentication, we will need a GitHub personal access token, so go ahead and [create this token in your GitHub developer settings](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token). ```{note} Make sure that `repo` scopes are enabled for your token. diff --git a/docs/source/deployment/kubeflow.md b/docs/source/deployment/kubeflow.md index 187d9a2ce6..d4e6f30ded 100644 --- a/docs/source/deployment/kubeflow.md +++ b/docs/source/deployment/kubeflow.md @@ -13,11 +13,11 @@ Here are the main reasons to use Kubeflow Pipelines: ## Prerequisites -To use Kubeflow Pipelines, make sure you have the following prerequisites in place: +To use Kubeflow Pipelines, ensure you have the following prerequisites in place: - [Kubeflow Pipelines is installed](https://www.kubeflow.org/docs/started/getting-started/) on your Kubernetes cluster - [Kubeflow Pipelines SDK is installed](https://www.kubeflow.org/docs/pipelines/sdk/install-sdk/) locally -- A `name` attribute is set for each [Kedro node](/kedro.pipeline.node) since it is used to trigger runs +- A `name` attribute is set for each [Kedro node](/kedro.pipeline.node), since it is used to trigger runs - [All node input/output DataSets must be configured in `catalog.yml`](../data/data_catalog.md#using-the-data-catalog-with-the-yaml-api) and refer to an external location (e.g. AWS S3); you cannot use the `MemoryDataSet` in your workflow ```{note} @@ -139,7 +139,7 @@ You can also specify two optional arguments: - `--pipeline`: pipeline name for which you want to build a workflow spec - `--env`: Kedro configuration environment name, defaults to `local` -For the purpose of this walk-through, we are going to use AWS S3 bucket for DataSets therefore `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment variables must be set to have an ability to communicate with S3. The `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` values should be stored in [Kubernetes Secrets](https://kubernetes.io/docs/concepts/configuration/secret/) ([an example Kubernetes Secrets spec is given below](#authenticate-kubeflow-pipelines)). +For the purpose of this walk-through, we will use AWS S3 bucket for datasets, therefore `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment variables must be set to have an ability to communicate with S3. The `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` values should be stored in [Kubernetes Secrets](https://kubernetes.io/docs/concepts/configuration/secret/) ([an example Kubernetes Secrets spec is given below](#authenticate-kubeflow-pipelines)). Finally, run the helper script from project's directory to build the workflow spec (the spec will be saved to `/.yaml` file). diff --git a/docs/source/deployment/prefect.md b/docs/source/deployment/prefect.md index daa89301df..04b11d8b94 100644 --- a/docs/source/deployment/prefect.md +++ b/docs/source/deployment/prefect.md @@ -10,12 +10,12 @@ Prefect Server ships out-of-the-box with a fully featured user interface. ## Prerequisites -To use Prefect Core and Prefect Server, make sure you have the following prerequisites in place: +To use Prefect Core and Prefect Server, ensure you have the following prerequisites in place: - [Prefect Core is installed](https://docs.prefect.io/core/getting_started/install.html) on your machine - [Docker](https://www.docker.com/) and [Docker Compose](https://docs.docker.com/compose/) are installed and Docker Engine is running - [Prefect Server is up and running](https://docs.prefect.io/orchestration/Server/deploy-local.html) -- `PREFECT__LOGGING__EXTRA_LOGGERS` environment variable is set (it is required to get Kedro logs emitted): +- `PREFECT__LOGGING__EXTRA_LOGGERS` environment variable is set (this is required to get Kedro logs published): ```console export PREFECT__LOGGING__EXTRA_LOGGERS="['kedro']" diff --git a/docs/source/development/commands_reference.md b/docs/source/development/commands_reference.md index f14632f62c..2cb76cb1d6 100644 --- a/docs/source/development/commands_reference.md +++ b/docs/source/development/commands_reference.md @@ -505,7 +505,7 @@ If you get an error message `Module ```` not found. Make sure to in 2. Run [`pip install -r src/requirements.txt`](#install-all-package-dependencies) command from your terminal ##### Copy tagged cells -To copy the code from [cells tagged](https://jupyter-notebook.readthedocs.io/en/stable/changelog.html#cell-tags) with `node` tag into Python files under `src//nodes/` in a Kedro project: +To copy the code from [cells tagged](https://jupyter-notebook.readthedocs.io/en/stable/changelog.html#cell-tags) with a `node` tag into Python files under `src//nodes/` in a Kedro project: ```bash kedro jupyter convert --all diff --git a/docs/source/development/debugging.md b/docs/source/development/debugging.md index 3158d94eb9..5f464b9fb2 100644 --- a/docs/source/development/debugging.md +++ b/docs/source/development/debugging.md @@ -4,12 +4,12 @@ If you're running your Kedro pipeline from the CLI or you can't/don't want to run Kedro from within your IDE debugging framework, it can be hard to debug your Kedro pipeline or nodes. This is particularly frustrating because: -* If you have long running nodes or pipelines, inserting `print` statements and running them multiple times quickly becomes a time-consuming procedure. +* If you have long running nodes or pipelines, inserting `print` statements and running them multiple times quickly becomes time-consuming. * Debugging nodes outside the `run` session isn't very helpful because getting access to the local scope within the `node` can be hard, especially if you're dealing with large data or memory datasets, where you need to chain a few nodes together or re-run your pipeline to produce the data for debugging purposes. This guide provides examples on [how to instantiate a post-mortem debugging session](https://docs.python.org/3/library/pdb.html#pdb.post_mortem) with [`pdb`](https://docs.python.org/3/library/pdb.html) using [Kedro Hooks](../hooks/introduction.md) when an uncaught error occurs during a pipeline run. Note that [ipdb](https://pypi.org/project/ipdb/) could be integrated in the same manner. -If you are looking for guides on how to set up debugging with IDEs, please visit [the guide for debugging in VSCode](./set_up_vscode.md#debugging) and [the guide for debugging in PyCharm](./set_up_pycharm.md#debugging). +For guides on how to set up debugging with IDEs, please visit the [guide for debugging in VSCode](./set_up_vscode.md#debugging) and the [guide for debugging in PyCharm](./set_up_pycharm.md#debugging). ## Debugging Node diff --git a/docs/source/development/set_up_pycharm.md b/docs/source/development/set_up_pycharm.md index 27894e03f1..3d851c807e 100644 --- a/docs/source/development/set_up_pycharm.md +++ b/docs/source/development/set_up_pycharm.md @@ -120,7 +120,7 @@ Click **OK** and then select **Remote Run** from the toolbar and click **Run** t ![](../meta/images/pycharm_remote_run.png) -[To remotely debug, click the debugger button as described above](#debugging). +[To debug remotely, click the debugger button as described above](#debugging). ## Advanced: Docker interpreter diff --git a/docs/source/development/set_up_vscode.md b/docs/source/development/set_up_vscode.md index 67e05d2da9..823491b78f 100644 --- a/docs/source/development/set_up_vscode.md +++ b/docs/source/development/set_up_vscode.md @@ -107,7 +107,7 @@ PYTHONPATH=/path/to/project/src:$PYTHONPATH PYTHONPATH=C:/path/to/project/src;%PYTHONPATH% ``` -[You can find more information about setting up environmental variables in the VSCode documentation](https://code.visualstudio.com/docs/python/environments#_environment-variable-definitions-file). +You can find [more information about setting up environmental variables in the VSCode documentation](https://code.visualstudio.com/docs/python/environments#_environment-variable-definitions-file). Go to **Debug > Add Configurations**. @@ -242,7 +242,7 @@ Go to the **Debugging** section in VS Code and select the newly created remote d ![](../meta/images/vscode_remote_debugger.png) -You will need to [set a breakpoint in VS Code as described in the debugging section above](#debugging) and start the debugger by clicking the green play triangle: +You must [set a breakpoint in VS Code as described in the debugging section above](#debugging) and start the debugger by clicking the green play triangle: [Find more information on debugging in VS Code](https://code.visualstudio.com/docs/python/debugging).