diff --git a/docs/articles/usingcchsflow.html b/docs/articles/usingcchsflow.html index d446c1d5..534106a5 100644 --- a/docs/articles/usingcchsflow.html +++ b/docs/articles/usingcchsflow.html @@ -129,8 +129,8 @@
Did you notice that the names for the variables are slightly different in the two mock databases? That isn’t a mistake: in the 2001 CCHS the variable for sex is DHHA_SEX
and in 2013 CCHS the variable is DHH_SEX
.
Don’t worry, cchsflow
is here to help! variableDetails.csv
contains the rules to harmonize those two variables into a common variable name. In the CCHS, the categories for sex
are consistient: 1 = males, 2 = females. If the category values or labels changed, variableDetails.csv
would provide instructions for how to harmonize them. You can learn more about these in later vignettes.
## As you can see in variables.csv, there are 119 variables that can be transformed that are divided into 6 sections, and 36 subjects.
+
+## As you can see in variables.csv, there are 119 variables that can be transformed that are divided into 6 sections, and 36 subjects.
sex2013 <- RecWTable(dataSource = cchsMock2013, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2013-2014-Annual-component", appendToData = FALSE, log = TRUE, variables = c("DHH_SEX"))
sex2013 <- RecWTable(dataSource = cchsMock2013, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2013-2014-Annual-component", appendToData = FALSE, log = TRUE, variables = c("DHH_SEX"))
## [1] "The variable DHH_SEX was recoded into DHH_SEX for the database cchs-82M0013-E-2013-2014-Annual-component the following recodes were made:"
## valueTo From rowsRecoded
## 1 1 1 2
## 2 2 2 1
## 3 NA::a 6 0
## 4 NA::b 7:9 0
-
-combinedSex <- bind_rows(sex2001, sex2013)
combinedSex <- bind_rows(sex2001, sex2013)
The categories in age
variable in the CCHS changed in 2005 and therefore it is not possible to have the same age
categories across all CCHS cycles. cchsflow
offers two options The first option is to transform the age
variable into two variables. DHHGAGE_A
is the age variable for CCHS cycles 2001-2003, and DHHGAGE_B
is the age variable for CCHS cycles 2005-2014. With this option, you cannot combine DHHGAGE_A
and DHHGAGE_B
into a single dataset.
age2001 <- RecWTable(dataSource = cchsMock2001, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2001-c1-1-general-file", variables = c("DHHGAGE_A"))
## [1] "NOTE: Not applicable, don't know, refusal, not stated (96-99) were options only in CCHS 2003, but had zero responses"
-
-
-
-
+
+
+
+
The categorical age
variable can also be transformed into a single continuous age
variable. This variable takes the midpoint age of each category for all CCHS cycles. With this option, the age category variable from all CCHS cycles can be combined into a single dataset.
age2001_cont <- RecWTable(dataSource = cchsMock2001, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2001-c1-1-general-file", variables = c("DHHGAGE_cont"))
## [1] "NOTE: Not applicable, don't know, refusal, not stated (96-99) were options in CCHS 2003, but had zero responses"
-
-age2013_cont <- RecWTable(dataSource = cchsMock2013, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2013-2014-Annual-component", variables = c("DHHGAGE_cont"))
combinedAge_cont <- bind_rows(age2001_cont, age2013_cont)
age2013_cont <- RecWTable(dataSource = cchsMock2013, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2013-2014-Annual-component", variables = c("DHHGAGE_cont"))
combinedAge_cont <- bind_rows(age2001_cont, age2013_cont)
In the above code, varLabels is called in RecWTable()
to label the age and sex variables in the 2001 and 2013 datasets. To ensure that the variables in your dataset are labelled, use get_label()
to view the variable labels in your transformed dataset. As mentioned previously, varLabels can be used all the variables in variablesSheet.csv
or a subset of variables.
combinedAgeSex <- bind_rows(agesex_2001, agesex_2013)
labelledCombinedAgeSex <- SetDataLabels(dataToLabel = combinedAgeSex, variableDetails = varDetails, variablesSheet = varSheet)
In the above code, SetDataLabels()
is used to label the combined age and sex dataset. Similar to before, you can check if labels have been added using get_label()
.
In the above code, SetDataLabels()
is used to label the combined age and sex dataset. Similar to before, you can check if labels have been added using get_label()
.
get_label(labelledCombinedAgeSex)
## DHHGAGE_cont DHH_SEX
## "Age" "Sex"
@@ -276,14 +276,14 @@ ## [1] "NOTE: Not applicable, don't know, refusal, not stated (96-99) were options only in CCHS 2003, but had zero responses"
## [1] "NOTE: Not applicable, don't know, refusal, not stated (96-99) were options in CCHS 2003, but had zero responses"
## [1] "NOTE: Don't know (999.7) and refusal (999.8) not included in 2001 CCHS"
-
-transformed2013 <- RecWTable(dataSource = cchsMock2013, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2013-2014-Annual-component")
transformed2013 <- RecWTable(dataSource = cchsMock2013, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2013-2014-Annual-component")
## [1] "NOTE: Don't know (999.7) and refusal (999.8) not included in 2001 CCHS"
-
-combinedCCHS <- bind_rows(transformed2001, transformed2013)
+
+combinedCCHS <- bind_rows(transformed2001, transformed2013)
labelledCombinedCCHS <- SetDataLabels(dataToLabel = combinedCCHS, variableDetails = varDetails, variablesSheet = varSheet)
-
-get_label(labelledCombinedCCHS)
+
+get_label(labelledCombinedCCHS)
## DHHGAGE_A DHHGAGE_cont
## "Age" "Age"
## DHH_SEX FVCDTOT
diff --git a/docs/articles/variableDetails.html b/docs/articles/variableDetails.html
index bedf47ce..fbf08c0e 100644
--- a/docs/articles/variableDetails.html
+++ b/docs/articles/variableDetails.html
@@ -103,8 +103,8 @@ variableDetails.csv
Introduction
The variableDetails.csv worksheet contain details for the variables in variables.csv
. Information from variableDetails.csv
worksheet is used by the RecWTable()
function of the bllflow
package to transform variables identifed in variableDetails$variableStart
to the newly transformed variable in variableDetails$variable
.
-
-#> In the `variableDetails.csv` worksheet there are 965 rows and 16 columns
+
+#> In the `variableDetails.csv` worksheet there are 965 rows and 16 columns
variableDetails.csv
, we have designated the variable names used in CCHS cycles from 2007 to 2014 as the final transformed variable name.N/A
. The name of a dummy variable consists of the final variable name, the number of categories in the variable, and the category level for each category. Note that this column is not necessary for RecWTable
.cat
; while a transformed variable that is continuous will be specified as cont
.The categorical age
variable in the 2001 CCHS survey is DHHAGAGE
. If the final variable name for categorical age in the variable column is DHHGAGE
, you would write the following in this column: cchs-82M0013-E-2001-c1-1-general-file::DHHAGAGE
The categorical age variable in the CCHS surveys from 2007 to 2014 is DHHGAGE
. Since it is the same as the final variable name, you would write in this column [DHHGAGE]
once. The variable name that is denoted within the square brackets is the default variable name.
cat
and continuous variables are denoted as cont
.copy
so that the function copies the values without any transformations. For the not applicable category, write NA::a
. For missing & else categories, write NA::b
variableDetails.csv
, variables that have gone from cat to cont have used midpoints of each category.numValidCat = N/A
. Not applicable, missing, and else categories are not included in the category count. Note that this column is not necessary for RecWTable()
.N/A
. Note, the function will not work if there different units between the rows of a variable.The rules for each category of a new variable are a string in recFrom
and value in recTo
. These recode pairs are the same syntax as {sjmisc::rec()
– for more details see bllflow::RecWTable(). Recode pairs are obtained from the RecFrom and RecTo columns multiple values that are recoded into a new single value are separated with comma, e.g. recFrom = "1,2"; recTo = 1
value range is indicated by a colon, e.g. recFrom= "1:4"; recTo = 1
(recodes all values from 1 to 4 into 1} value range for double vectors (with fractional part), all values within the specified range are recoded; e.g. recFrom = "1:2.5"; recTo = 1
recodes 1 to 2.5 into 1, but 2.55 would not be recoded (since it’s not included in the specified range) minimum and maximum values are indicates by min
(or lo
) and max
(or hi
), e.g. recFrom = "min:4"; recTo = 1
(recodes all values from minimum values to 4 into 1) NA is used for missing values (don’t know, refusal, not stated) else is used all other values, which have not been specified yet, are indicated by else
, e.g. recFrom = "else"; recTo = NA
(recode all other values (not specified in other rows) to “NA”)} copy the else
token can be combined with copy
, indicating that all remaining, not yet recoded values should stay the same (are copied from the original value), e.g. recFrom = "else"; recTo = "copy"
bllflow
helper functions. See bllflow documentation.recode-with-table
function. Things to include here would be changes in wording between CCHS surveys, missing/changes in categories, and changes in variable type between CCHS surveys.HWTGBMI
. This should be written for each row.cont
in each row of BMI.cont
in each row of BMI.copy
written. For the not applicable rows NA::a
is written. For the missing and else rows NA::b
is written.N/A
is written in each row.BMI
is written. Not applicable rows not applicable
is written. Missing rows: missing
. Else row: else
body mass index
is written to give further detail on what BMI is. The other rows remain the same.kg/m2
is written in each row.11.91:57.9
. In the 2001 and 2003 CCHS surveys not applicable was coded as 999.6 so the recFrom for this row would be 999.6:999.6
. Similarly, in the 2001 and 2003 CCHS surveys don’t know was coded as 999.7, refusal was coded as 999.8, and not stated was coded as 999.9. Therefore the recFrom for the missing row for CCHS 2001 and 2003 would be 999.7:999.9
. In the not applicable row for the 2005 CCHS survey onwards, the recFrom is 999.96:999.96
. In the missing row for CCHS 2005 onwards, the recFrom is 999.97:999.99
. For the else row, just write else
.BMI / self-report (D,G)
is written as it is written in CCHS documentation. The other rows remain the same, and the values for each missing category are stated in the missing rows.BMI
for each row is sufficient for this variable.BMI / self-report - (D,G)
.#> There are 119 variables, grouped in 36 subjects and 6 sections that are available for transformation in CCHS cycles from 2001 to 2014.
#> You can search for variables in the table below. Try searching for the 3 age variables that are used in the Transform CCHS variables vignette. All 3 variables are in the age subject. Try sorting the subject column by clicking the up beside the `subject` heading: the top 3 rows of the table should show the age variables:
#> [1] "DHHGAGE_A" "DHHGAGE_B" "DHHGAGE_cont"
-
-
+
+
variables.csv
. Transfomations are performed using RecWTable()
from the bllflow
R package.This repository does not include the CCHS data. Information for how to access the CCHS data can be found here. Canadian university community can also access the CCCHS through Odesi – See health/Canada/Canadian Community Health Survey.
+This repository does not include the CCHS data. Information for how to access the CCHS data can be found here. Canadian university community can also access the CCHS through Odesi – See health/Canada/Canadian Community Health Survey.