Chapter 50 Exploratory Analysis Project

50.1 Learning objectives

By the end of this chapter, you should be able to:

  • create a short exploratory analysis report;
  • summarize the dimensions and structure of several datasets;
  • count missing values;
  • create reusable functions and apply them with purrr::map();
  • explain cleaning decisions in plain language.

50.2 Purpose

This chapter turns the earlier practice work into a more polished exploratory analysis document. The goal is to create a report that a colleague or manager could read to understand the data, what was cleaned, and what decisions were made.

50.3 Load libraries

library(tidyverse)
library(readxl)
library(lubridate)

50.4 Read data

mort <- read_excel("./Raw Data/deaths_2016.xlsx")
pop  <- read_csv("./Raw Data/Population_Estimates.csv")
corr <- read_csv("./Raw Data/Corr_2016.csv", locale = readr::locale(encoding = "latin1"))
env  <- read_csv("./Raw Data/Weather_data.csv")

50.5 Create a data list

data_list <- list(
  mortality = mort,
  population = pop,
  correspondence = corr,
  environment = env
)

50.6 Dataset dimensions

map(data_list, dim)
map(data_list, nrow)
map(data_list, ncol)

50.7 Missing values function

count_missing <- function(data) {
  data %>% summarize(across(everything(), ~sum(is.na(.))))
}
map(data_list, count_missing)

50.9 Practice task

Create a new R Markdown document called exploratory_analysis.Rmd. Use the structure above and include at least one table, one missing-value summary, and one plot.