Skip to content

Commit

Permalink
[Feature] preparing website for gh-pages set up
Browse files Browse the repository at this point in the history
  • Loading branch information
wyusuf068 committed Sep 23, 2019
1 parent 1772461 commit f72600b
Show file tree
Hide file tree
Showing 17 changed files with 378 additions and 493 deletions.
Binary file modified .DS_Store
Binary file not shown.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
.Rproj.user
.Rhistory
.RData
.Ruserdata
.Ruserdata
.DS_Store
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Authors@R: c(
comment = c(ORCID = "0000-0003-0912-0845")),
person(given = "Warsame",
family = "Yusuf",
role = c("aut"),
role = c("aut", "cre"),
email = "[email protected]"),
person(given = "Rostyslav",
family = "Vyuha",
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
# cchsflow

This repository contains supports the use of the Canadian Community Health Survey (CCHS). The current focus is transformation of harmonized variables across surveys from 2001 to 2014.
This package contains supports the use of the Canadian Community Health Survey (CCHS). The current focus is transformation of harmonized variables across surveys from 2001 to 2014. At the heart of `cchsflow` are two worksheets (CSV files) that describe how to transform variables different CCHS cycles into common variables: `variables.csv` and `variableDetails.csv`.

Documents include:
Documents in the repository include:

1. `variables.csv` - a list of variables that can be transformed across CCHS surveys. The default variable name corresponds to 2007 CCHS.
2. `variableDetails.csv` - information that describes how the variables are recoded.
3. DDI documents for the original CCHS surveys -- see CCHS_DDI folder.

## Important notes

Care must be taken to understand how your specific use of variable transformation and harmonization may result in misclassfication error and other forms of bias. Most variables have had some change in wording and category responses across the lifetime of the CCHS from 2001 to 2013. Furthermore, there have been changes in survey sampling, response rates, weighting methods and other survey design changes that affect responses.
Care must be taken to understand how your specific use of variable transformation and harmonization may result in misclassfication error and other forms of bias. Most variables have had some change in wording and category responses across the lifetime of the CCHS from 2001 to 2014. Furthermore, there have been changes in survey sampling, response rates, weighting methods and other survey design changes that affect responses.

The transformations that are described in this repository have been used in several research projects (see reference list) but no guarantees are made regarding the accuracy or appropriate uses.

Expand Down
94 changes: 80 additions & 14 deletions Vignettes/usingcchsflow.Rmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
---
title: "Using cchsflow"
date: 'Last updated: 2019-08-27'
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{1 - using cchsflow}
Expand All @@ -21,7 +20,7 @@ The `RecWTable` and `SetDataLabels` functions are part of the [bllflow](https://
```{r eval= FALSE}
install.packages("devtools")
library(devtools)
install_github("Big-Life-Lab/bllflow", ref = "recode-with-table-patch")
install_github("Big-Life-Lab/bllflow")
```
```{r results= 'hide', message = FALSE, warning=FALSE}
library(bllflow)
Expand Down Expand Up @@ -51,9 +50,9 @@ cchsMock2001 <- data.frame(DHHA_SEX = c(2, 1, 1, 6), DHHAGAGE = c(3, 4, 6, 6), F
cchsMock2013 <- data.frame(DHH_SEX = c(1, 2, 1), DHHGAGE = c(2, 1, 1), FVCDTOT = c(25, 15, 6))
```

Did you notice that the names for the variables are slightly different in the two mock databases? That isn't a mistake: in the cchs2001 the variable for sex is `DHHA_SEX` and in CCHS2013 the variable is `DHH_SEX`.
Did you notice that the names for the variables are slightly different in the two mock databases? That isn't a mistake: in the 2001 CCHS the variable for sex is `DHHA_SEX` and in 2013 CCHS the variable is `DHH_SEX`.

Don't worry, `cchsflow` is here to help! `variableDetails.csv` contains the rules to harmonize those two variables into a common variable name. In the CCHS, the categories for `sex` are consistient: 1 = males, 2 = females. If the category values or labels changed, `variableDetails.csv` would provide instructions for how to harmonize them. You can learn more about `variables.csv` and `variableDetails.csv` in later vignettes. see ... and ...
Don't worry, `cchsflow` is here to help! `variableDetails.csv` contains the rules to harmonize those two variables into a common variable name. In the CCHS, the categories for `sex` are consistient: 1 = males, 2 = females. If the category values or labels changed, `variableDetails.csv` would provide instructions for how to harmonize them. You can learn more about these in later vignettes.

```{r, echo=FALSE, message=FALSE, warning=FALSE}
library(DT)
Expand Down Expand Up @@ -93,7 +92,8 @@ In this example, the sex variable in the 2001 CCHS cycle is transformed.

```{r, warning=FALSE}
sex2001 <- RecWTable(dataSource = cchsMock2001, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2001-c1-1-general-file", log = TRUE, variables = c("DHH_SEX"))
```
```{r, echo=FALSE}
datatable(sex2001, options = list(columnDefs = list(list(className = 'dt-center', targets = 0:1)), dom = 't'))
```

Expand All @@ -109,12 +109,24 @@ This example shows how you can transform and combine a variable across multiple

```{r, warning = FALSE}
sex2001 <- RecWTable(dataSource = cchsMock2001, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2001-c1-1-general-file", appendToData = FALSE, log = TRUE, variables = c("DHH_SEX"))
```

```{r, echo=FALSE}
datatable(sex2001, options = list(columnDefs = list(list(className = 'dt-center', targets = 0:1)), dom = 't'))
```

```{r, warning=FALSE}
sex2013 <- RecWTable(dataSource = cchsMock2013, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2013-2014-Annual-component", appendToData = FALSE, log = TRUE, variables = c("DHH_SEX"))
```
```{r, echo=FALSE}
datatable(sex2013, options = list(columnDefs = list(list(className = 'dt-center', targets = 0:1)), dom = 't'))
```

```{r, warning=FALSE}
combinedSex <- bind_rows(sex2001, sex2013)
```

```{r, echo=FALSE}
datatable(combinedSex, options = list(columnDefs = list(list(className = 'dt-center', targets = 0:1)), dom = 't'))
```

Expand All @@ -127,10 +139,18 @@ There are many variables in the CCHS that changes in categories between cycles i
The categories in `age` variable in the CCHS changed in 2005 and therefore it is not possible to have the same `age` categories across all CCHS cycels. `cchsflow` offers two optins. The first option is to transform the `age` variable into two variables. `DHHGAGE_A` is the age variable for CCHS cycles 2001-2003, and `DHHGAGE_B` is the age variable for CCHS cycles 2005-2014. With this option, you cannot combine `DHHGAGE_A` and `DHHGAGE_B` into a single dataset.

```{r, warning=FALSE}
age2001 <- RecWTable(dataSource = cchsMock2001, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2001-c1-1-general-file", log = TRUE, variables = c("DHHGAGE_A"))
age2001 <- RecWTable(dataSource = cchsMock2001, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2001-c1-1-general-file", variables = c("DHHGAGE_A"))
```

```{r, echo=FALSE}
datatable(age2001, options = list(columnDefs = list(list(className = 'dt-center', targets = 0:1)), dom = 't'))
```

```{r, warning=FALSE}
age2013 <- RecWTable(dataSource = cchsMock2013, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2013-2014-Annual-component", variables = c("DHHGAGE_B"))
```

age2013 <- RecWTable(dataSource = cchsMock2013, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2013-2014-Annual-component", log = TRUE, variables = c("DHHGAGE_B"))
```{r, echo=FALSE}
datatable(age2013, options = list(columnDefs = list(list(className = 'dt-center', targets = 0:1)), dom = 't'))
```

Expand All @@ -140,13 +160,26 @@ datatable(age2013, options = list(columnDefs = list(list(className = 'dt-center'
The categorical `age` variable can also be transformed into a single continuous `age` variable. This variable takes the midpoint age of each category for all CCHS cycles. With this option, the age category variable from all CCHS cycles can be combined into a single dataset.

```{r, warning=FALSE}
age2001_cont <- RecWTable(dataSource = cchsMock2001, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2001-c1-1-general-file", log = TRUE, variables = c("DHHGAGE_cont"))
age2001_cont <- RecWTable(dataSource = cchsMock2001, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2001-c1-1-general-file", variables = c("DHHGAGE_cont"))
```

```{r, echo=FALSE}
datatable(age2001_cont, options = list(columnDefs = list(list(className = 'dt-center', targets = 0:1)), dom = 't'))
```

age2013_cont <- RecWTable(dataSource = cchsMock2013, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2013-2014-Annual-component", log = TRUE, variables = c("DHHGAGE_cont"))
```{r, warning=FALSE}
age2013_cont <- RecWTable(dataSource = cchsMock2013, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2013-2014-Annual-component", variables = c("DHHGAGE_cont"))
```

```{r, echo=FALSE}
datatable(age2013_cont, options = list(columnDefs = list(list(className = 'dt-center', targets = 0:1)), dom = 't'))
```

```{r, warning= FALSE}
combinedAge_cont <- bind_rows(age2001_cont, age2013_cont)
```

```{r, echo=FALSE}
datatable(combinedAge_cont, options = list(columnDefs = list(list(className = 'dt-center', targets = 0:1)), dom = 't'))
```

Expand All @@ -173,6 +206,9 @@ In the above code, varLabels is called in `RecWTable()` to label the age and sex
```{r, warning=FALSE}
combinedAgeSex <- bind_rows(agesex_2001, agesex_2013)
labelledCombinedAgeSex <- SetDataLabels(dataToLabel = combinedAgeSex, variableDetails = varDetails, variablesSheet = varSheet)
```

```{r, echo=FALSE}
datatable(labelledCombinedAgeSex, options = list(columnDefs = list(list(className = 'dt-center', targets = 0:2)), dom = 't'))
```

Expand All @@ -188,23 +224,53 @@ For more information on `get_label()` and other label helper functions, please r

All the variables listed in `varDetails.csv` will be transformed if the variables argument in `RecWTable()` is not specified. In this example, all of the variables in our mock 2001 and 2013 datasets will be transformed, combined, and labelled.

```{r, warning=FALSE}
```{r, echo=FALSE}
options(htmlwidgets.TOJSON_ARGS = list(na = 'string'))
transformed2001 <- RecWTable(dataSource = cchsMock2001, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2001-c1-1-general-file", log = TRUE)
```

```{r, warning=FALSE}
transformed2001 <- RecWTable(dataSource = cchsMock2001, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2001-c1-1-general-file")
```

```{r, echo=FALSE}
datatable(transformed2001, options = list(columnDefs = list(list(className = 'dt-center', targets = 0:4)), dom = 't'))
```

```{r, warning=FALSE}
transformed2013 <- RecWTable(dataSource = cchsMock2013, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2013-2014-Annual-component")
```

transformed2013 <- RecWTable(dataSource = cchsMock2013, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2013-2014-Annual-component", log = TRUE)
```{r, echo=FALSE}
datatable(transformed2013, options = list(columnDefs = list(list(className = 'dt-center', targets = 0:4)), dom = 't'))
```

```{r, warning=FALSE}
combinedCCHS <- bind_rows(transformed2001, transformed2013)
labelledCombinedCCHS <- SetDataLabels(dataToLabel = combinedCCHS, variableDetails = varDetails, variablesSheet = varSheet)
```

```{r, echo=FALSE}
datatable(labelledCombinedCCHS, options = list(columnDefs = list(list(className = 'dt-center', targets = 0:5)), dom = 't'))
```

```{r, warning=FALSE}
get_label(labelledCombinedCCHS)
```

### Step 4. Warning messages

Warning messages will appear when the variables in your dataset do not match the variables in your two worksheets. In our example, our mock CCHS datasets only contain a two of variables from `variables.csv` and `varDetails.csv`. As such, warning messages about variables not included in our datasets will be printed.

```{r, echo=FALSE, results="hide", warning=2}
transformed2001 <- RecWTable(dataSource = cchsMock2001, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2001-c1-1-general-file", log = TRUE)
```
transformed2001 <- RecWTable(dataSource = cchsMock2001, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2001-c1-1-general-file")
```

## Using your own CCHS dataset to transform variables

As mentioned previously, CCHS datasets cannot be shared publicly. But that does not mean you cannot use a saved CCHS dataset on your computer to transform variables. Below illustrates a code that can be used to load a CCHS dataset onto your R environment.

```{r, eval=FALSE}
cchsDataset <- read.csv("~/Documents/cchsdataset.csv")
```

You can copy this code to your clipboard and modify the path to where your dataset is saved onto your computer.
Loading

0 comments on commit f72600b

Please sign in to comment.