diff --git a/docs/articles/usingcchsflow.html b/docs/articles/usingcchsflow.html index d446c1d5..534106a5 100644 --- a/docs/articles/usingcchsflow.html +++ b/docs/articles/usingcchsflow.html @@ -129,8 +129,8 @@

cchsMock2013 <- data.frame(DHH_SEX = c(1, 2, 1), DHHGAGE = c(2, 1, 1), FVCDTOT = c(25, 15, 6))

Did you notice that the names for the variables are slightly different in the two mock databases? That isn’t a mistake: in the 2001 CCHS the variable for sex is DHHA_SEX and in 2013 CCHS the variable is DHH_SEX.

Don’t worry, cchsflow is here to help! variableDetails.csv contains the rules to harmonize those two variables into a common variable name. In the CCHS, the categories for sex are consistient: 1 = males, 2 = females. If the category values or labels changed, variableDetails.csv would provide instructions for how to harmonize them. You can learn more about these in later vignettes.

-
-
## As you can see in variables.csv, there are 119 variables that can be transformed that are divided into 6 sections, and 36 subjects.
+
+
## As you can see in variables.csv, there are 119 variables that can be transformed that are divided into 6 sections, and 36 subjects.

@@ -176,8 +176,8 @@

## 2 2 2 1 ## 3 NA::a 6 1 ## 4 NA::b 7:9 0 -
- +
+

@@ -196,18 +196,18 @@

## 2 2 2 1 ## 3 NA::a 6 1 ## 4 NA::b 7:9 0 -
-
sex2013 <- RecWTable(dataSource = cchsMock2013, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2013-2014-Annual-component", appendToData = FALSE, log = TRUE, variables = c("DHH_SEX"))
+
+
sex2013 <- RecWTable(dataSource = cchsMock2013, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2013-2014-Annual-component", appendToData = FALSE, log = TRUE, variables = c("DHH_SEX"))
## [1] "The variable DHH_SEX was recoded into DHH_SEX for the database cchs-82M0013-E-2013-2014-Annual-component the following recodes were made:"
 ##   valueTo From rowsRecoded
 ## 1       1    1           2
 ## 2       2    2           1
 ## 3   NA::a    6           0
 ## 4   NA::b  7:9           0
-
-
combinedSex <- bind_rows(sex2001, sex2013)
-
- +
+
combinedSex <- bind_rows(sex2001, sex2013)
+
+

@@ -219,10 +219,10 @@

The categories in age variable in the CCHS changed in 2005 and therefore it is not possible to have the same age categories across all CCHS cycles. cchsflow offers two options The first option is to transform the age variable into two variables. DHHGAGE_A is the age variable for CCHS cycles 2001-2003, and DHHGAGE_B is the age variable for CCHS cycles 2005-2014. With this option, you cannot combine DHHGAGE_A and DHHGAGE_B into a single dataset.

age2001 <- RecWTable(dataSource = cchsMock2001, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2001-c1-1-general-file", variables = c("DHHGAGE_A"))
## [1] "NOTE: Not applicable, don't know, refusal, not stated (96-99) were options only in CCHS 2003, but had zero responses"
-
-
age2013 <- RecWTable(dataSource = cchsMock2013, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2013-2014-Annual-component", variables = c("DHHGAGE_B"))
-
- +
+
age2013 <- RecWTable(dataSource = cchsMock2013, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2013-2014-Annual-component", variables = c("DHHGAGE_B"))
+
+

@@ -230,12 +230,12 @@

The categorical age variable can also be transformed into a single continuous age variable. This variable takes the midpoint age of each category for all CCHS cycles. With this option, the age category variable from all CCHS cycles can be combined into a single dataset.

age2001_cont <- RecWTable(dataSource = cchsMock2001, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2001-c1-1-general-file", variables = c("DHHGAGE_cont"))
## [1] "NOTE: Not applicable, don't know, refusal, not stated (96-99) were options in CCHS 2003, but had zero responses"
-
-
age2013_cont <- RecWTable(dataSource = cchsMock2013, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2013-2014-Annual-component", variables = c("DHHGAGE_cont"))
-
-
combinedAge_cont <- bind_rows(age2001_cont, age2013_cont)
-
- +
+
age2013_cont <- RecWTable(dataSource = cchsMock2013, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2013-2014-Annual-component", variables = c("DHHGAGE_cont"))
+
+
combinedAge_cont <- bind_rows(age2001_cont, age2013_cont)
+
+

@@ -261,8 +261,8 @@

In the above code, varLabels is called in RecWTable() to label the age and sex variables in the 2001 and 2013 datasets. To ensure that the variables in your dataset are labelled, use get_label() to view the variable labels in your transformed dataset. As mentioned previously, varLabels can be used all the variables in variablesSheet.csv or a subset of variables.

combinedAgeSex <- bind_rows(agesex_2001, agesex_2013)
 labelledCombinedAgeSex <- SetDataLabels(dataToLabel = combinedAgeSex, variableDetails = varDetails, variablesSheet = varSheet)
-
-

In the above code, SetDataLabels() is used to label the combined age and sex dataset. Similar to before, you can check if labels have been added using get_label().

+
+

In the above code, SetDataLabels() is used to label the combined age and sex dataset. Similar to before, you can check if labels have been added using get_label().

get_label(labelledCombinedAgeSex)
## DHHGAGE_cont      DHH_SEX 
 ##        "Age"        "Sex"
@@ -276,14 +276,14 @@

## [1] "NOTE: Not applicable, don't know, refusal, not stated (96-99) were options only in CCHS 2003, but had zero responses"
 ## [1] "NOTE: Not applicable, don't know, refusal, not stated (96-99) were options in CCHS 2003, but had zero responses"
 ## [1] "NOTE: Don't know (999.7) and refusal (999.8) not included in 2001 CCHS"
-
-
transformed2013 <- RecWTable(dataSource = cchsMock2013, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2013-2014-Annual-component")
+
+
transformed2013 <- RecWTable(dataSource = cchsMock2013, variableDetails = varDetails, datasetName = "cchs-82M0013-E-2013-2014-Annual-component")
## [1] "NOTE: Don't know (999.7) and refusal (999.8) not included in 2001 CCHS"
-
-
combinedCCHS <- bind_rows(transformed2001, transformed2013)
+
+
combinedCCHS <- bind_rows(transformed2001, transformed2013)
 labelledCombinedCCHS <- SetDataLabels(dataToLabel = combinedCCHS, variableDetails = varDetails, variablesSheet = varSheet)
-
-
get_label(labelledCombinedCCHS)
+
+
get_label(labelledCombinedCCHS)
##                     DHHGAGE_A                  DHHGAGE_cont 
 ##                         "Age"                         "Age" 
 ##                       DHH_SEX                       FVCDTOT 
diff --git a/docs/articles/variableDetails.html b/docs/articles/variableDetails.html
index bedf47ce..fbf08c0e 100644
--- a/docs/articles/variableDetails.html
+++ b/docs/articles/variableDetails.html
@@ -103,8 +103,8 @@ 

variableDetails.csv

Introduction

The variableDetails.csv worksheet contain details for the variables in variables.csv. Information from variableDetails.csv worksheet is used by the RecWTable() function of the bllflow package to transform variables identifed in variableDetails$variableStart to the newly transformed variable in variableDetails$variable.

-
-
#> In the `variableDetails.csv` worksheet there are 965 rows and 16 columns
+
+
#> In the `variableDetails.csv` worksheet there are 965 rows and 16 columns

@@ -135,18 +135,18 @@

  • variable: the name of the final transformed variable. In variableDetails.csv, we have designated the variable names used in CCHS cycles from 2007 to 2014 as the final transformed variable name.
  • -
    -
      +
      +
      1. dummyVariable: the dummy variable for each category in a transformed categorical variable. This is only applicable for categorical variables; for continuous variables it is set as N/A. The name of a dummy variable consists of the final variable name, the number of categories in the variable, and the category level for each category. Note that this column is not necessary for RecWTable.
      -
      -
        +
        +
        1. toType: the variable type of the final transformed variable. In this column, a transformed variable that is categorical will be specified as cat; while a transformed variable that is continuous will be specified as cont.
        -
        -
          +
          +
          1. databaseStart: the CCHS surveys that contain the variable of interest, separated by commas. Each CCHS survey contains a unique identifier in DDI document.
          @@ -162,13 +162,13 @@

          #> abstract: The Canadian Community Health Survey (CCHS) is a cross-sectional survey that collects information related to health status, health care utilization and health determinants for the Canadian population. The CCHS operates ona two-year collection cycle. The first year of the survey cycle .1 is a large sample, general population health survey, designed to provide reliable estimates at the health region level. The second year of the survey cycle.2 is a smaller survey designed to provide provincial level results on specific focused health topics. #> <br> #> This Microdata File contains data collected in the first year of collection for the CCHS (Cycle 1.1). Information was collected between September 2000 and November 2001, for 136 health regions, covering all provinces and territories. The CCHS (Cycle 1.1) collects responses from persons aged 12 or older, living in private occupied dwellings. Excluded from the sampling frame are individuals living on Indian Reserves and on Crown Lands, institutional residents, full-time members of the Canadian Armed Forces, and residents of certain remote regions.

    -
    -
      +
      +
      1. variableStart: the original names of the variables as they are listed in each respective CCHS cycle, separated by commas. If the variable name in a particular CCHS survey is different from the transformed variable name, write out the CCHS survey identifier, add two colons, and write out the original variable name for that cycle. If the variable name in a particular CCHS survey is the same as the transformed variable name, the variable name is written out surrounded by square brackets. Note: this only needs to be written out once.
      -
      -
        +
        +
        • The categorical age variable in the 2001 CCHS survey is DHHAGAGE. If the final variable name for categorical age in the variable column is DHHGAGE, you would write the following in this column: cchs-82M0013-E-2001-c1-1-general-file::DHHAGAGE

        • The categorical age variable in the CCHS surveys from 2007 to 2014 is DHHGAGE. Since it is the same as the final variable name, you would write in this column [DHHGAGE] once. The variable name that is denoted within the square brackets is the default variable name.

        @@ -176,63 +176,63 @@

      • fromType: the variable type as indicated in the CCHS surveys. As indicated in the toType column, categorical variables are denoted as cat and continuous variables are denoted as cont.
    -
    -
      +
      +
      1. recTo: the value you would like to recode each category value to. For continuous variables that are not transformed in type, you would write in this column copy so that the function copies the values without any transformations. For the not applicable category, write NA::a. For missing & else categories, write NA::b
      -
      -
        +
        +
        • For categorical variables that are not changing variable types (i.e. cat to cat), it is ideal to retain the same values as indicated in each CCHS survey. But for transformed categorical variables that have changed in type (i.e cat to cont), you will have to develop values that make the most sense to your analysis. In variableDetails.csv, variables that have gone from cat to cont have used midpoints of each category.
        1. numValidCat: the number of categories for a variable. This only applies to variables in which the toType is cat. For continuous variables, numValidCat = N/A. Not applicable, missing, and else categories are not included in the category count. Note that this column is not necessary for RecWTable().
        -
        -
          +
          +
          1. catLabel: short form label describing the category of a particular variable.
          -
          -
            +
            +
            1. catLabelLong: more detailed label describing the category of a particular variable. This label should be identical to what is shown in the CCHS data documentation, unless you are creating derived variables and would like to create your own label for it.
            -
            -
              +
              +
              1. units: the units of a particular variable. If there are no units for the variable, write N/A. Note, the function will not work if there different units between the rows of a variable.
              -
              -
                +
                +
                1. recFrom: the range of values for a particular category in a variable as indicated in the CCHS. See CCHS data documentation for each survey cycle and use the smallest and large values as your range to capture all values between the survey years.

                The rules for each category of a new variable are a string in recFrom and value in recTo. These recode pairs are the same syntax as {sjmisc::rec() – for more details see bllflow::RecWTable(). Recode pairs are obtained from the RecFrom and RecTo columns multiple values that are recoded into a new single value are separated with comma, e.g. recFrom = "1,2"; recTo = 1 value range is indicated by a colon, e.g. recFrom= "1:4"; recTo = 1 (recodes all values from 1 to 4 into 1} value range for double vectors (with fractional part), all values within the specified range are recoded; e.g. recFrom = "1:2.5"; recTo = 1 recodes 1 to 2.5 into 1, but 2.55 would not be recoded (since it’s not included in the specified range) minimum and maximum values are indicates by min (or lo) and max (or hi), e.g. recFrom = "min:4"; recTo = 1 (recodes all values from minimum values to 4 into 1) NA is used for missing values (don’t know, refusal, not stated) else is used all other values, which have not been specified yet, are indicated by else, e.g. recFrom = "else"; recTo = NA (recode all other values (not specified in other rows) to “NA”)} copy the else token can be combined with copy, indicating that all remaining, not yet recoded values should stay the same (are copied from the original value), e.g. recFrom = "else"; recTo = "copy"

                -
                -
                  +
                  +
                  1. catStartLabel: label describing each category. This label should be identical to what is shown in the CCHS data documentation. For the missing row, each missing category is described along with their coded values. You can import labels from the CCHS DDI files using bllflow helper functions. See bllflow documentation.
                  -
                  -
                    +
                    +
                    1. variableStartShortLabel: short form label describing the variable.
                    -
                    -
                      +
                      +
                      1. variableStartLabel: more detailed label describing the variable. This label should be identical to what is shown in the CCHS data documentation.
                      -
                      -
                        +
                        +
                        1. notes: any relevant notes to inform the user running the recode-with-table function. Things to include here would be changes in wording between CCHS surveys, missing/changes in categories, and changes in variable type between CCHS surveys.
                        -
                        - +
                        +

    @@ -253,96 +253,96 @@

  • variable: the most common variable name for BMI is HWTGBMI. This should be written for each row.
  • -
    -
      +
      +
      1. dummyVariable: BMI is a continuous variable, so it does not have dummy variables.
      -
      -
        +
        +
        1. toType: BMI was captured in the CCHS as a continuous variable. It does not make much sense to transform it into a categorical variable, so the toType should be cont in each row of BMI.
        -
        -
          +
          +
          1. databaseStart: BMI was captured in all CCHS surveys between 2001 and 2014, so in the first row with the continuous “category” and the else row, the CCHS identifers will be listed this column:
          -
          -
            +
            +
            • For the not applicable and missing rows that pertain to the 2001 and 2003 CCHS surveys, only write the 2001 and 2003 identifiers in this column. For the not applicable and missing rows that pertain to the 2005 CCHS survey and onwards, write the identifiers for CCHS 2005 onwards. This is because the not applicable category and the missing categories are coded differently.
            -
            -
              +
              +
              1. variableStart: In the 2001, 2003, and 2005 CCHS surveys the BMI variable differs from the common name, while in the CCHS surveys from 2007-2014, the BMI variable is the same as the common name. However, the values for not applicable and missing categories changes after 2003. Therefore for the first & else rows, the variableStart column will look like this:
              -
              -
                +
                +
                • For the not applicable and missing rows that pertain to the 2001 and 2003 CCHS surveys, the variable names for those two cycles will be written.
                -
                -
                  +
                  +
                  • For the not applicable and missing rows that pertain to the 2005 CCHS surveys onwards, the column will look like this:
                  -
                  -
                    +
                    +
                    1. fromType: As mentioned previously, BMI was measured as a continuous variable in the CCHS, so the fromType should be cont in each row of BMI.
                    -
                    -
                      +
                      +
                      1. recTo: Since this is a continuous variable, the first row (the main “category”) has copy written. For the not applicable rows NA::a is written. For the missing and else rows NA::b is written.
                      -
                      -
                        +
                        +
                        1. numValidCat: Since this is a continuous variable, there are no actual categories; so N/A is written in each row.
                        -
                        -
                          +
                          +
                          1. catLabel: For the first row BMI is written. Not applicable rows not applicable is written. Missing rows: missing. Else row: else
                          -
                          -
                            +
                            +
                            1. catLabelLong: For the first row, body mass index is written to give further detail on what BMI is. The other rows remain the same.
                            -
                            -
                              +
                              +
                              1. units: BMI is measured in kg/m2, so kg/m2 is written in each row.
                              -
                              -
                                +
                                +
                                1. recFrom: Going through the CCHS data documentation from 2001 to 2014, it was found that the lowest BMI value was 11.91 and the highest BMI value was 57.9. Therefore the recFrom for the first row is written as 11.91:57.9. In the 2001 and 2003 CCHS surveys not applicable was coded as 999.6 so the recFrom for this row would be 999.6:999.6. Similarly, in the 2001 and 2003 CCHS surveys don’t know was coded as 999.7, refusal was coded as 999.8, and not stated was coded as 999.9. Therefore the recFrom for the missing row for CCHS 2001 and 2003 would be 999.7:999.9. In the not applicable row for the 2005 CCHS survey onwards, the recFrom is 999.96:999.96. In the missing row for CCHS 2005 onwards, the recFrom is 999.97:999.99. For the else row, just write else.
                                -
                                -
                                  +
                                  +
                                  1. catStartLabel: For the first row, BMI / self-report (D,G) is written as it is written in CCHS documentation. The other rows remain the same, and the values for each missing category are stated in the missing rows.
                                  -
                                  -
                                    +
                                    +
                                    1. variableStartShortLabel: Writing BMI for each row is sufficient for this variable.
                                    -
                                    -
                                      +
                                      +
                                      1. variableStartLabel: As per CCHS documentation, the label for this variable is BMI / self-report - (D,G).
                                      -
                                      -
                                        +
                                        +
                                        1. notes: As described previously, there are differences between CCHS surveys with regards to coding the not applicable and missing categories. These are documented in this section. Aside from this, there are other changes and differences that should also be documented. In the 2001 CCHS survey, this variable was restricted to participants aged 20-64. As well, don’t know (999.97) and refusal (999.98) were not asked in this survey.
                                        -
                                        - +
                                        +

    diff --git a/docs/articles/variablesSheet.html b/docs/articles/variablesSheet.html index fc17a088..0b68df4b 100644 --- a/docs/articles/variablesSheet.html +++ b/docs/articles/variablesSheet.html @@ -109,8 +109,8 @@

    #> There are 119 variables, grouped in 36 subjects and 6 sections that  are available for transformation in CCHS cycles from 2001 to 2014.
    #> You can search for variables in the table below. Try searching for the 3 age variables that are used in the Transform CCHS variables vignette. All 3 variables are in the age subject. Try sorting the subject column by clicking the up beside the `subject` heading: the top 3 rows of the table should show the age variables:
     #> [1] "DHHGAGE_A"    "DHHGAGE_B"    "DHHGAGE_cont"
    -
    - +
    +

    diff --git a/docs/index.html b/docs/index.html index 36c9050a..e08b13dd 100644 --- a/docs/index.html +++ b/docs/index.html @@ -99,7 +99,7 @@
  • Vignettes that describe how use R to transform or generate new derived variables that are listed in variables.csv. Transfomations are performed using RecWTable() from the bllflow R package.
  • Codebooks (metadata documents) for the original CCHS surveys – see CCHS_DDI folder. The PDF and DDI documents are a resource to examine how variables change across survey cycles.
  • -

    This repository does not include the CCHS data. Information for how to access the CCHS data can be found here. Canadian university community can also access the CCCHS through Odesi – See health/Canada/Canadian Community Health Survey.

    +

    This repository does not include the CCHS data. Information for how to access the CCHS data can be found here. Canadian university community can also access the CCHS through Odesi – See health/Canada/Canadian Community Health Survey.

    Important notes