Exploratory Analysis Project
Learning objectives
By the end of this chapter, you should be able to:
- create a short exploratory analysis report;
- summarize the dimensions and structure of several datasets;
- count missing values;
- create reusable functions and apply them with
purrr::map();
- explain cleaning decisions in plain language.
Purpose
This chapter turns the earlier practice work into a more polished exploratory analysis document. The goal is to create a report that a colleague or manager could read to understand the data, what was cleaned, and what decisions were made.
Load libraries
library(tidyverse)
library(readxl)
library(lubridate)
Read data
mort <- read_excel("./Raw Data/deaths_2016.xlsx")
pop <- read_csv("./Raw Data/Population_Estimates.csv")
corr <- read_csv("./Raw Data/Corr_2016.csv", locale = readr::locale(encoding = "latin1"))
env <- read_csv("./Raw Data/Weather_data.csv")
Create a data list
data_list <- list(
mortality = mort,
population = pop,
correspondence = corr,
environment = env
)
Dataset dimensions
map(data_list, dim)
map(data_list, nrow)
map(data_list, ncol)
Missing values function
count_missing <- function(data) {
data %>% summarize(across(everything(), ~sum(is.na(.))))
}
map(data_list, count_missing)
Recommended report structure
Use this structure for your exploratory analysis R Markdown file:
- Purpose of the analysis
- Data sources
- Data dimensions
- Column descriptions
- Missing value checks
- Duplicate checks
- Cleaning decisions
- Initial plots
- Summary of key findings
- Next steps
Practice task
Create a new R Markdown document called exploratory_analysis.Rmd. Use the structure above and include at least one table, one missing-value summary, and one plot.