Data Management with R
Welcome
Overview
How to Use This Book
Recommended Chapter Flow
Project Structure
Required R Packages
Building the Book
Data and Reproducibility Notes
Intended Audience
Licence
1 Project Setup and Reproducible Workflows
2 Tidyverse Basics
3 Joining Data
4 Data Cleaning and Data Management
- 4.1 Load libraries
5 Load in the data
6 Data Description
7 Strings and Regular Expressions
8 Learning Objectives
9 1. Load Libraries
10 2. Titanic Example: Cleaning Names and Extracting Titles
- 10.1 2.1 Read the Data
- 10.2 2.2 Basic Cleaning
11 3. Separate Passenger Names
- 11.1 3.1 Remove Extra Spaces
12 4. Extract Titles Using Regex
- 12.1 4.1 Regex Explanation
13 5. Create a Function
- 13.1 5.1 Test the Function
14 6. Practice Question: Summarize Titanic Titles
- 14.1 6.1 Plot Titanic Titles
15 7. WHO Data Example: Reshaping Data
16 8. Why Reshape the WHO Data?
17 9. Pivot WHO Data from Wide to Long Format
18 10. Clean Inconsistent Column Names
19 11. Separate the Key Column
20 12. Separate Sex and Age
21 13. Full WHO Cleaning Pipeline
22 14. WHO Data Dictionary
23 15. Regex Practice Question
- 23.1 15.1 Explanation of the Regex
24 16. Discussion Question: NA vs Zero
25 17. Additional Practice Questions
- 25.1 17.1 Titanic Practice
- 25.2 17.2 WHO Practice
26 18. Example Solutions for Extra Practice
27 19. Key Takeaways
28 Visualization and Advanced Cleaning
29 Purpose of this practice
30 Research questions
31 1. Load packages
32 2. Read data
33 3. Initial data review
34 4. Clean and reshape population data
35 5. Visualize population data
36 6. Identify possible join keys
37 7. Clean mortality data
- 37.1 7.1 Remove columns not needed for the current analysis
- 37.2 7.2 Check and remove duplicate records
38 8. Work with missing and unexpected values
- 38.1 8.1 Check birth year values
- 38.2 8.2 Check death month and death day
39 9. Create date variables and calculate age
40 10. Work with ICD-10 cause-of-death codes
- 40.1 10.1 Separate ICD-10 letters and numbers
- 40.2 10.2 Keep cancer deaths only
41 11. Categorize cancer type and age group
42 12. Visualize cleaned mortality data
43 13. Save cleaned cancer mortality data
44 14. Clean environmental exposure data
- 44.1 14.1 Save exposure data
45 15. Join mortality data to correspondence file
- 45.1 15.1 Save joined Analysis 1 data
46 16. Postal code quality check activity
47 17. Practice activity: Join population and correspondence data
48 18. Final assignment guidance
49 19. Reflection questions
50 Exploratory Analysis Project
51 Storyboard and Reporting

Data Management with R

Chapter 31 1. Load packages

library(tidyverse)
library(readxl)
library(lubridate)
library(stringr)