fixes #172

data-cleaning · May 1, 2023 · f550d0e · f550d0e
1 parent beb48ce
commit f550d0e
Show file tree

Hide file tree

Showing 25 changed files with 541 additions and 519 deletions.
diff --git a/cookbook/01-intro.Rmd b/cookbook/01-intro.Rmd
@@ -1,4 +1,4 @@
-# Introduction to validate {#sect:intro}
+# Introduction to validate {#sect-intro}
 
 ```{r, include=FALSE}
 source("chunk_opts.R")
@@ -110,8 +110,8 @@ need to remember the following.
 
 
 You are now ready to start validating your data, and navigate Chapters
-\@ref(sect:availableunique)-\@ref(sect:statisticalchecks) to learn how to
-define specific types of checks. Chapter~\@ref(sect:work), discusses more
+\@ref(sect-availableunique)-\@ref(sect-statisticalchecks) to learn how to
+define specific types of checks. Chapter~\@ref(sect-work), discusses more
 details about working with `validate`.
 
 

diff --git a/cookbook/02-variable_level_checks.Rmd b/cookbook/02-variable_level_checks.Rmd
@@ -1,4 +1,4 @@
-# Variable checks {#sect:varlevelchecks}
+# Variable checks {#sect-varlevelchecks}
 
 ```{r, include=FALSE}
 source("chunk_opts.R")
@@ -47,7 +47,7 @@ We see that each rule checks a single item, namely one column of data. The
 first rule is violated (it is in fact a `factor` variable). The second rule
 is satisfied.
 
-## Missingness {#sect:missingness}
+## Missingness {#sect-missingness}
 
 Use R's standard `is.na()` to check missing items in individual variables. Negate
 it to check that values are available.
@@ -77,8 +77,8 @@ summary(out)
 ```
 
 
-- To check whether records or parts thereof are completed, see \@ref(sect:iscomplete).
-- To check whether records are available at all, see \@ref(sect:completeness).
+- To check whether records or parts thereof are completed, see \@ref(sect-iscomplete).
+- To check whether records are available at all, see \@ref(sect-completeness).
 
 
 ## Field length

diff --git a/cookbook/03-availability-and-uniqueness.Rmd b/cookbook/03-availability-and-uniqueness.Rmd
@@ -1,4 +1,4 @@
-# Availability and uniqueness {#sect:availableunique}
+# Availability and uniqueness {#sect-availableunique}
 
 
 ```{r, include=FALSE}
@@ -10,8 +10,8 @@ and/or complete with respect to a set of keys, and whether they are unique.
 The checks described here are typically useful for data in 'long' format, where
 one column holds a value and all the other columns identify the value.
 
-- To test for missing values in individual variables, see also \@ref(sect:missingness).
-- To check whether records or parts thereof are completed, see \@ref(sect:iscomplete).
+- To test for missing values in individual variables, see also \@ref(sect-missingness).
+- To check whether records or parts thereof are completed, see \@ref(sect-iscomplete).
 
 **Data**
 
@@ -66,7 +66,7 @@ must add up to the annual values.
 
 
 
-## Uniqueness  {#sect:uniqueness}
+## Uniqueness  {#sect-uniqueness}
 
 The function `is_unique()` checks whether combinations of variables (usually
 key variables) uniquely identify a record. It accepts any positive number of
@@ -138,10 +138,10 @@ is_unique(df$x, df$y)
 
 
 
-## Availability of records {#sect:completeness}
+## Availability of records {#sect-completeness}
 
 This section is on testing for availability of whole records. Testing for individual
-missing values (`r NA`), is treated in \@ref(sect:missingness). 
+missing values (`r NA`), is treated in \@ref(sect-missingness). 
 
 
 We wish to ensure that for each region, and each variable, the periods 2014,

diff --git a/cookbook/04-multivariate-checks.Rmd b/cookbook/04-multivariate-checks.Rmd
@@ -19,7 +19,7 @@ head(SBS2000, 3)
 
 ```
 
-## Completeness of records {#sect:iscomplete}
+## Completeness of records {#sect-iscomplete}
 
 The functions `is_complete()` and `all_complete()` are convenience functions
 that test for missing values or combinations thereof in records.
@@ -43,8 +43,8 @@ complete.  The output is one logical value (`TRUE` or `FALSE`) for each record.
 The fourth rule tests whether _all_ values are present in the `id` column, and
 it results in a single `TRUE` or `FALSE`. 
 
-- To test for missing values in individual variables, see also \@ref(sect:missingness).
-- To check whether records are available at all, see \@ref(sect:completeness).
+- To test for missing values in individual variables, see also \@ref(sect-missingness).
+- To check whether records are available at all, see \@ref(sect-completeness).
 
 
 ## Balance equalities and inequalities 
@@ -77,7 +77,7 @@ out <- confront(SBS2000, rules, lin.ineq.eps=0, lin.eq.eps=0.01)
 summary(out)
 ```
 
-See \@ref(sect:options) for more information on setting and resetting options.
+See \@ref(sect-options) for more information on setting and resetting options.
 
 
 ## Conditional restrictions

diff --git a/cookbook/05-statistical-checks.Rmd b/cookbook/05-statistical-checks.Rmd
@@ -1,4 +1,4 @@
-# Statistical checks {#sect:statisticalchecks}
+# Statistical checks {#sect-statisticalchecks}
 
 
 ```{r, include=FALSE}
@@ -35,7 +35,7 @@ head(samplonomy, 3)
 ```
 
 
-## Statistical and groupwise characteristics {#sect:groupwise}
+## Statistical and groupwise characteristics {#sect-groupwise}
 
 Any R expression that ultimately is an equality or inequality check is
 interpreted as a validation rule by validate. This means that any statistical

diff --git a/cookbook/06-indicators.Rmd b/cookbook/06-indicators.Rmd
@@ -1,4 +1,4 @@
-# Indicators {#sect:indicators}
+# Indicators {#sect-indicators}
 
 
 ```{r, include=FALSE}

diff --git a/cookbook/07-working.Rmd b/cookbook/07-working.Rmd
@@ -1,4 +1,4 @@
-# Working with validate {#sect:work}
+# Working with validate {#sect-work}
 
 
 ```{r, include=FALSE}
@@ -44,7 +44,7 @@ object as `v`.  To make an actual copy, you can select everything.
 w <- v[]
 ```
 It is also possible to concatenate two validator objects. For example when you
-read two rule sets from two files (See \@ref(sect:readfromfile)). This is done
+read two rule sets from two files (See \@ref(sect-readfromfile)). This is done
 by adding them together with `+`.
 ```{r}
 rules1 <- validator(speed>=0)
@@ -152,7 +152,7 @@ sufficient to have three character columns, named `rule`, `name` and
 
 
 
-## Validation rule syntax {#sect:syntax}
+## Validation rule syntax {#sect-syntax}
 
 Conceptually, any R statement that will evaluate to a `logical` is considered a
 validating statement. The validate package checks this when the user defines a
@@ -198,11 +198,11 @@ group aggregates.
 | `mean_by`           | groupwise mean                   |
 | `median_by`         | groupwise median                 |
 
-See also Section \@ref(sect:groupwise).
+See also Section \@ref(sect-groupwise).
 
 There are a number of functions that perform a particular validation task that
 would be hard to express with basic syntax.  These are treated extensively
-in Chapters \@ref(sect:varlevelchecks) to \@ref(sect:statisticalchecks), but
+in Chapters \@ref(sect-varlevelchecks) to \@ref(sect-statisticalchecks), but
 here is a quick overview.
 
 |function             | checks                                                         |
@@ -264,7 +264,7 @@ summary(cf[c(1,3)])
 ```
 
 
-## Confrontation options {#sect:options}
+## Confrontation options {#sect-options}
 By default, all errors and warnings are caught when validation rules are confronted with data. This can be switched off by setting the `raise` option to `"errors"` or `"all"`. The following 
 example contains a specification error: `hite` should be `height` and therefore the rule errors
 on the `women` data.frame because it does not contain a column `hite`. The error is caught

diff --git a/cookbook/08-rule-files.Rmd b/cookbook/08-rule-files.Rmd
@@ -1,4 +1,4 @@
-# Rules in text files {#sect:rulefiles}
+# Rules in text files {#sect-rulefiles}
 
 
 ```{r, include=FALSE}
@@ -11,11 +11,11 @@ free-form text and in YAML. We also discuss some more advanced features like
 how to have one rule file include another file.
 
 
-## Reading rules from file {#sect:readfromfile}
+## Reading rules from file {#sect-readfromfile}
 
 It is a very good idea to store and maintain rule sets outside of your R
 script. Validate supports two file formats: simple text files and `yaml` files.
-Here we only discuss simple text files, yaml files are treated in \@ref(sect:yamlfiles).
+Here we only discuss simple text files, yaml files are treated in \@ref(sect-yamlfiles).
 
 To try this, copy the following rules into a new text file and store it in a
 file called `myrules.R`, in the current working directory of your R session.
@@ -34,7 +34,7 @@ regular R code. Reading these rules can be done as follows.
 rules <- validator(.file="myrules.R")
 ```
 
-## Metadata in text files: `YAML` {#sect:yamlfiles}
+## Metadata in text files: `YAML` {#sect-yamlfiles}
 
 [YAML](https://yaml.org) is a data format that aims to be easy to learn and
 human-readable. The name 'YAML' is a [recursive

diff --git a/cookbook/09-sdmx.Rmd b/cookbook/09-sdmx.Rmd
@@ -1,4 +1,4 @@
-# Rules from SDMX {#sect:sdmxrules}
+# Rules from SDMX {#sect-sdmxrules}
 
 **Note** This functionality is available for `validate` versions `1.1.0` or higher.
 

diff --git a/cookbook/10-comparing.Rmd b/cookbook/10-comparing.Rmd
@@ -1,4 +1,4 @@
-# Comparing data sets {#sect:comparing}
+# Comparing data sets {#sect-comparing}
 
 
 ```{r, include=FALSE}

diff --git a/cookbook/11-references.Rmd b/cookbook/11-references.Rmd
@@ -25,7 +25,7 @@ complexity analyses and examples from practice.
 
 >  M. Zio, N. Fursova, T. Gelsema, S. Giessing, U Guarnera, J. Ptrauskiene, Q. L. Kalben, M. Scanu, K. ten Bosch, M. van der Loo, and K. Walsdorfe (2015) [Methodology for data validation](https://cros-legacy.ec.europa.eu/system/files/methodology_for_data_validation_v1.0_rev-2016-06_final.pdf). _Deliverable of the ESSNet on validation_.
 
-The `lumberjack` package discussed in Chapter \@ref(sect:comparing) is described in the following
+The `lumberjack` package discussed in Chapter \@ref(sect-comparing) is described in the following
 paper.
 
 > MPJ van der Loo (2020). [Monitoring Data in R with the lumberjack package](https://www.jstatsoft.org/article/view/v098i01). _Journal of Statistical Software_, 98(1)

diff --git a/cookbook/index.Rmd b/cookbook/index.Rmd
@@ -56,14 +56,14 @@ The purposes of this book include demonstrating the main tools and workflows of
 the `validate` package, giving examples of common data validation tasks, and
 showing how to analyze data validation results.
 
-The book is organized as follows. Chapter \@ref(sect:intro) discusses the bare
+The book is organized as follows. Chapter \@ref(sect-intro) discusses the bare
 necessities to be able to follow the rest of the book. Chapters
-\@ref(sect:varlevelchecks) to \@ref(sect:statisticalchecks) form the 'cookbook'
+\@ref(sect-varlevelchecks) to \@ref(sect-statisticalchecks) form the 'cookbook'
 part of the book and discuss many different ways to check your data by example.
-Chapter \@ref(sect:indicators) is devoted to deriving plausibility measures
-with the `validate` package.  Chapters \@ref(sect:work) and
-\@ref(sect:rulefiles) treat working with validate in-depth. Chapter
-\@ref(sect:comparing) discusses how to compare two or more versions of a
+Chapter \@ref(sect-indicators) is devoted to deriving plausibility measures
+with the `validate` package.  Chapters \@ref(sect-work) and
+\@ref(sect-rulefiles) treat working with validate in-depth. Chapter
+\@ref(sect-comparing) discusses how to compare two or more versions of a
 dataset, possibly automated through the
 [lumberjack](https://cran.r-project.org/package=lumberjack) package.  The
 section with Biblographical Notes lists some references and points out some