Chapter 5 Load in the data
If we want to quickly look at all the data in the folder we can use:
## [1] "Corr_2016 copy.csv" "Corr_2016.csv"
## [3] "Data Dictionary - Blank.xlsx" "Data Dictionary - Filled.xlsx"
## [5] "deaths_2016.xlsx" "Population_Estimates.csv"
## [7] "titanic.csv" "Titantic_DataDictionary.xlsx"
## [9] "Weather_data.csv"
List files is a powerful tool that is often overlooked.
We can look for specific types of files with list.files
## [1] "Corr_2016 copy.csv" "Corr_2016.csv" "Population_Estimates.csv"
## [4] "titanic.csv" "Weather_data.csv"
## [1] "Data Dictionary - Blank.xlsx" "Data Dictionary - Filled.xlsx"
## [3] "deaths_2016.xlsx" "Titantic_DataDictionary.xlsx"
If your Corr_2016 is zipped, unzip it with your unzipping program (Winrar, 7zip) before continuing.
Or you can use:
Where the first argument is the path of the zip file you want to unzip and exdir is the folder you want to unzip the file to.
You only need to unzip the file once, so once this is done, you can comment out the above code with a # in front of unzip. Or change the chunk to:
{include=FALSE, eval=FALSE}
Let’s load the data using readr for this example exercise read more here.
We will be working with the deaths_2016.xlsx, Population_Estimates.xlsx, Corr_2016.csv and Weather_data.csv. These data are from the Introduction to Data Management course developed by Megan Striha and this data is reused for this exercise. 4 data sets will be used in this course:
Mortality Data Population Data Correspondence Files Environmental Data
#Research Questions
Analysis 1: Calculate age and sex specific cancer mortality rates by health region in BC Analysis 2: Link environmental data to the cancer mortality data to perform an odds ratio analysis *
*if this analysis where to be done in the real world, it would be better to use cancer cases rather than mortality data, but for this course, the cleaning and data management (and not the analysis or research question) is of the focus.