- Purpose: Create a useful metadata solution for all sectors — flexible, used by everyone, and easy to collaborate.
- Produce modularized output (based on regional grouping) that all sectors can utilize
- There is one csv output, one script, and one raw input folder for each regional grouping --- all named by the regional grouping
- All outputs follow the same format and use the same code as the data warehouse
- Avoid duplication work of cleaning raw data
We aim to align all the coding with the data warehouse.
- raw_data: Contains the original input files for producing the "output", folder names should follow the
regional_grouping
, e.g., "UNICEF_PROG_REG_GLOBAL" - output: Contains files for each regional grouping (or
parent
), named by theregional_grouping
, e.g., "UNICEF_PROG_REG_GLOBAL.csv" - scripts: Contains the scripts to produce the output files, e.g., "UNICEF_PROG_REG_GLOBAL.R"
- R: Contains general functions used by each script, e.g. "general_functions.R"
The output files from this project serve as inputs for downstream processes.
The data is in a long format, where each region is mapped to all the ISO3Code
values belonging to it.
In this way, the data can be easily reshaped into a wide format if needed using ISO3Code ~ Region_Code
. Then every row represents an ISO3Code/Country, and every region becomes a column
# for example
dt_unicef_prog <- fread("output/UNICEF_PROG_REG_GLOBAL.csv")
dt_unicef_prog_wide <- data.table::dcast(dt_unicef_prog, ISO3Code ~ Region_Code, value.var = "Region")
head(dt_unicef_prog_wide)