Skip to content

Commit

Permalink
fixes #172
Browse files Browse the repository at this point in the history
  • Loading branch information
markvanderloo committed May 1, 2023
1 parent beb48ce commit f550d0e
Show file tree
Hide file tree
Showing 25 changed files with 541 additions and 519 deletions.
6 changes: 3 additions & 3 deletions cookbook/01-intro.Rmd
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Introduction to validate {#sect:intro}
# Introduction to validate {#sect-intro}

```{r, include=FALSE}
source("chunk_opts.R")
Expand Down Expand Up @@ -110,8 +110,8 @@ need to remember the following.

You are now ready to start validating your data, and navigate Chapters
\@ref(sect:availableunique)-\@ref(sect:statisticalchecks) to learn how to
define specific types of checks. Chapter~\@ref(sect:work), discusses more
\@ref(sect-availableunique)-\@ref(sect-statisticalchecks) to learn how to
define specific types of checks. Chapter~\@ref(sect-work), discusses more
details about working with `validate`.


Expand Down
8 changes: 4 additions & 4 deletions cookbook/02-variable_level_checks.Rmd
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Variable checks {#sect:varlevelchecks}
# Variable checks {#sect-varlevelchecks}

```{r, include=FALSE}
source("chunk_opts.R")
Expand Down Expand Up @@ -47,7 +47,7 @@ We see that each rule checks a single item, namely one column of data. The
first rule is violated (it is in fact a `factor` variable). The second rule
is satisfied.

## Missingness {#sect:missingness}
## Missingness {#sect-missingness}

Use R's standard `is.na()` to check missing items in individual variables. Negate
it to check that values are available.
Expand Down Expand Up @@ -77,8 +77,8 @@ summary(out)
```


- To check whether records or parts thereof are completed, see \@ref(sect:iscomplete).
- To check whether records are available at all, see \@ref(sect:completeness).
- To check whether records or parts thereof are completed, see \@ref(sect-iscomplete).
- To check whether records are available at all, see \@ref(sect-completeness).


## Field length
Expand Down
12 changes: 6 additions & 6 deletions cookbook/03-availability-and-uniqueness.Rmd
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Availability and uniqueness {#sect:availableunique}
# Availability and uniqueness {#sect-availableunique}


```{r, include=FALSE}
Expand All @@ -10,8 +10,8 @@ and/or complete with respect to a set of keys, and whether they are unique.
The checks described here are typically useful for data in 'long' format, where
one column holds a value and all the other columns identify the value.

- To test for missing values in individual variables, see also \@ref(sect:missingness).
- To check whether records or parts thereof are completed, see \@ref(sect:iscomplete).
- To test for missing values in individual variables, see also \@ref(sect-missingness).
- To check whether records or parts thereof are completed, see \@ref(sect-iscomplete).

**Data**

Expand Down Expand Up @@ -66,7 +66,7 @@ must add up to the annual values.



## Uniqueness {#sect:uniqueness}
## Uniqueness {#sect-uniqueness}

The function `is_unique()` checks whether combinations of variables (usually
key variables) uniquely identify a record. It accepts any positive number of
Expand Down Expand Up @@ -138,10 +138,10 @@ is_unique(df$x, df$y)



## Availability of records {#sect:completeness}
## Availability of records {#sect-completeness}

This section is on testing for availability of whole records. Testing for individual
missing values (`r NA`), is treated in \@ref(sect:missingness).
missing values (`r NA`), is treated in \@ref(sect-missingness).


We wish to ensure that for each region, and each variable, the periods 2014,
Expand Down
8 changes: 4 additions & 4 deletions cookbook/04-multivariate-checks.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ head(SBS2000, 3)
```

## Completeness of records {#sect:iscomplete}
## Completeness of records {#sect-iscomplete}

The functions `is_complete()` and `all_complete()` are convenience functions
that test for missing values or combinations thereof in records.
Expand All @@ -43,8 +43,8 @@ complete. The output is one logical value (`TRUE` or `FALSE`) for each record.
The fourth rule tests whether _all_ values are present in the `id` column, and
it results in a single `TRUE` or `FALSE`.

- To test for missing values in individual variables, see also \@ref(sect:missingness).
- To check whether records are available at all, see \@ref(sect:completeness).
- To test for missing values in individual variables, see also \@ref(sect-missingness).
- To check whether records are available at all, see \@ref(sect-completeness).


## Balance equalities and inequalities
Expand Down Expand Up @@ -77,7 +77,7 @@ out <- confront(SBS2000, rules, lin.ineq.eps=0, lin.eq.eps=0.01)
summary(out)
```

See \@ref(sect:options) for more information on setting and resetting options.
See \@ref(sect-options) for more information on setting and resetting options.


## Conditional restrictions
Expand Down
4 changes: 2 additions & 2 deletions cookbook/05-statistical-checks.Rmd
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Statistical checks {#sect:statisticalchecks}
# Statistical checks {#sect-statisticalchecks}


```{r, include=FALSE}
Expand Down Expand Up @@ -35,7 +35,7 @@ head(samplonomy, 3)
```


## Statistical and groupwise characteristics {#sect:groupwise}
## Statistical and groupwise characteristics {#sect-groupwise}

Any R expression that ultimately is an equality or inequality check is
interpreted as a validation rule by validate. This means that any statistical
Expand Down
2 changes: 1 addition & 1 deletion cookbook/06-indicators.Rmd
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Indicators {#sect:indicators}
# Indicators {#sect-indicators}


```{r, include=FALSE}
Expand Down
12 changes: 6 additions & 6 deletions cookbook/07-working.Rmd
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Working with validate {#sect:work}
# Working with validate {#sect-work}


```{r, include=FALSE}
Expand Down Expand Up @@ -44,7 +44,7 @@ object as `v`. To make an actual copy, you can select everything.
w <- v[]
```
It is also possible to concatenate two validator objects. For example when you
read two rule sets from two files (See \@ref(sect:readfromfile)). This is done
read two rule sets from two files (See \@ref(sect-readfromfile)). This is done
by adding them together with `+`.
```{r}
rules1 <- validator(speed>=0)
Expand Down Expand Up @@ -152,7 +152,7 @@ sufficient to have three character columns, named `rule`, `name` and



## Validation rule syntax {#sect:syntax}
## Validation rule syntax {#sect-syntax}

Conceptually, any R statement that will evaluate to a `logical` is considered a
validating statement. The validate package checks this when the user defines a
Expand Down Expand Up @@ -198,11 +198,11 @@ group aggregates.
| `mean_by` | groupwise mean |
| `median_by` | groupwise median |

See also Section \@ref(sect:groupwise).
See also Section \@ref(sect-groupwise).

There are a number of functions that perform a particular validation task that
would be hard to express with basic syntax. These are treated extensively
in Chapters \@ref(sect:varlevelchecks) to \@ref(sect:statisticalchecks), but
in Chapters \@ref(sect-varlevelchecks) to \@ref(sect-statisticalchecks), but
here is a quick overview.

|function | checks |
Expand Down Expand Up @@ -264,7 +264,7 @@ summary(cf[c(1,3)])
```


## Confrontation options {#sect:options}
## Confrontation options {#sect-options}
By default, all errors and warnings are caught when validation rules are confronted with data. This can be switched off by setting the `raise` option to `"errors"` or `"all"`. The following
example contains a specification error: `hite` should be `height` and therefore the rule errors
on the `women` data.frame because it does not contain a column `hite`. The error is caught
Expand Down
8 changes: 4 additions & 4 deletions cookbook/08-rule-files.Rmd
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Rules in text files {#sect:rulefiles}
# Rules in text files {#sect-rulefiles}


```{r, include=FALSE}
Expand All @@ -11,11 +11,11 @@ free-form text and in YAML. We also discuss some more advanced features like
how to have one rule file include another file.


## Reading rules from file {#sect:readfromfile}
## Reading rules from file {#sect-readfromfile}

It is a very good idea to store and maintain rule sets outside of your R
script. Validate supports two file formats: simple text files and `yaml` files.
Here we only discuss simple text files, yaml files are treated in \@ref(sect:yamlfiles).
Here we only discuss simple text files, yaml files are treated in \@ref(sect-yamlfiles).

To try this, copy the following rules into a new text file and store it in a
file called `myrules.R`, in the current working directory of your R session.
Expand All @@ -34,7 +34,7 @@ regular R code. Reading these rules can be done as follows.
rules <- validator(.file="myrules.R")
```

## Metadata in text files: `YAML` {#sect:yamlfiles}
## Metadata in text files: `YAML` {#sect-yamlfiles}

[YAML](https://yaml.org) is a data format that aims to be easy to learn and
human-readable. The name 'YAML' is a [recursive
Expand Down
2 changes: 1 addition & 1 deletion cookbook/09-sdmx.Rmd
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Rules from SDMX {#sect:sdmxrules}
# Rules from SDMX {#sect-sdmxrules}

**Note** This functionality is available for `validate` versions `1.1.0` or higher.

Expand Down
2 changes: 1 addition & 1 deletion cookbook/10-comparing.Rmd
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Comparing data sets {#sect:comparing}
# Comparing data sets {#sect-comparing}


```{r, include=FALSE}
Expand Down
2 changes: 1 addition & 1 deletion cookbook/11-references.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ complexity analyses and examples from practice.

> M. Zio, N. Fursova, T. Gelsema, S. Giessing, U Guarnera, J. Ptrauskiene, Q. L. Kalben, M. Scanu, K. ten Bosch, M. van der Loo, and K. Walsdorfe (2015) [Methodology for data validation](https://cros-legacy.ec.europa.eu/system/files/methodology_for_data_validation_v1.0_rev-2016-06_final.pdf). _Deliverable of the ESSNet on validation_.
The `lumberjack` package discussed in Chapter \@ref(sect:comparing) is described in the following
The `lumberjack` package discussed in Chapter \@ref(sect-comparing) is described in the following
paper.

> MPJ van der Loo (2020). [Monitoring Data in R with the lumberjack package](https://www.jstatsoft.org/article/view/v098i01). _Journal of Statistical Software_, 98(1)
Expand Down
12 changes: 6 additions & 6 deletions cookbook/index.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -56,14 +56,14 @@ The purposes of this book include demonstrating the main tools and workflows of
the `validate` package, giving examples of common data validation tasks, and
showing how to analyze data validation results.

The book is organized as follows. Chapter \@ref(sect:intro) discusses the bare
The book is organized as follows. Chapter \@ref(sect-intro) discusses the bare
necessities to be able to follow the rest of the book. Chapters
\@ref(sect:varlevelchecks) to \@ref(sect:statisticalchecks) form the 'cookbook'
\@ref(sect-varlevelchecks) to \@ref(sect-statisticalchecks) form the 'cookbook'
part of the book and discuss many different ways to check your data by example.
Chapter \@ref(sect:indicators) is devoted to deriving plausibility measures
with the `validate` package. Chapters \@ref(sect:work) and
\@ref(sect:rulefiles) treat working with validate in-depth. Chapter
\@ref(sect:comparing) discusses how to compare two or more versions of a
Chapter \@ref(sect-indicators) is devoted to deriving plausibility measures
with the `validate` package. Chapters \@ref(sect-work) and
\@ref(sect-rulefiles) treat working with validate in-depth. Chapter
\@ref(sect-comparing) discusses how to compare two or more versions of a
dataset, possibly automated through the
[lumberjack](https://cran.r-project.org/package=lumberjack) package. The
section with Biblographical Notes lists some references and points out some
Expand Down
Loading

0 comments on commit f550d0e

Please sign in to comment.