• Data Management with R
  • Welcome
  • Overview
  • How to Use This Book
  • Recommended Chapter Flow
  • Project Structure
  • Required R Packages
  • Building the Book
  • Data and Reproducibility Notes
  • Intended Audience
  • Licence
  • 1 Project Setup and Reproducible Workflows
    • 1.1 Organizing a reproducible project
    • 1.2 Working with RStudio Projects and file paths
    • 1.3 Naming files and managing outputs
    • 1.4 R Markdown and bookdown workflows
    • 1.5 Git, GitHub, and reproducible analysis
    • 1.6 Chapter summary
  • 2 Tidyverse Basics
    • 2.1 Loading packages and reading data
    • 2.2 Inspecting data frames
    • 2.3 Data manipulation with dplyr
    • 2.4 Practice exercise
    • 2.5 Chapter summary
  • 3 Joining Data
    • 3.1 Understanding join keys
    • 3.2 Performing joins with dplyr
    • 3.3 Checking for duplicate keys
    • 3.4 Practical workflow for joining data
    • 3.5 Practice exercise
    • 3.6 Chapter summary
  • 4 Data Cleaning and Data Management
    • 4.1 Preparing the R environment
    • 4.2 Exploring project files
    • 4.3 Description of the datasets
    • 4.4 Importing data into R
    • 4.5 Inspecting datasets
    • 4.6 Exploring and validating the data
    • 4.7 Cleaning variable names and reshaping data
    • 4.8 Practical considerations for R Markdown workflows
    • 4.9 Chapter summary
  • 5 Strings and Regular Expressions
    • 5.1 Working with character data in R
    • 5.2 Cleaning and separating character variables
    • 5.3 Extracting patterns with regular expressions
    • 5.4 Creating reusable functions
    • 5.5 Summarizing and visualizing categorical data
    • 5.6 Reshaping complex datasets
    • 5.7 Using regular expressions inside pivot_longer()
    • 5.8 Missing values and interpretation
    • 5.9 Additional practice
    • 5.10 Chapter summary
  • 6 Visualization and Advanced Data Cleaning
    • 6.1 Research context and datasets
    • 6.2 Reviewing datasets before cleaning
    • 6.3 Cleaning and reshaping population data
    • 6.4 Visualizing population data
    • 6.5 Identifying join keys
    • 6.6 Cleaning mortality data
    • 6.7 Handling missing and inconsistent values
    • 6.8 Creating dates and calculating age
    • 6.9 Imputing missing values
    • 6.10 Working with ICD-10 codes
    • 6.11 Visualizing cleaned mortality data
    • 6.12 Saving cleaned datasets
    • 6.13 Joining datasets and validating postal codes
    • 6.14 Final assignment guidance
    • 6.15 Chapter summary
  • 7 Exploratory Analysis Project
    • 7.1 Loading libraries and importing datasets
    • 7.2 Reviewing dataset structure
    • 7.3 Working with missing values
    • 7.4 Duplicate records and cleaning decisions
    • 7.5 Exploratory visualization
    • 7.6 Organizing an exploratory analysis report
    • 7.7 Practice activity
    • 7.8 Chapter summary
  • 8 Storyboarding and Reporting
    • 8.1 Why storyboarding is important
    • 8.2 Organizing the analytical narrative
    • 8.3 Writing interpretation instead of only showing output
    • 8.4 Separating technical details from reader-focused explanations
    • 8.5 Building a polished R Markdown report
    • 8.6 Example interpretation workflow
    • 8.7 Practice activity
    • 8.8 Chapter summary
  • References

Data Management with R

Recommended Chapter Flow

The book is organized into the following learning sequence:

  1. Project Setup
    Set up an RStudio Project, organize folders, and understand reproducible workflows.

  2. Tidyverse Basics
    Learn core tidyverse functions for reading, selecting, filtering, mutating, and summarizing data.

  3. Joining Data
    Practice combining datasets using keys and joins, with examples from the Titanic dataset.

  4. Data Cleaning
    Work with raw files, column names, missing values, reshaping, and structured cleaning steps.

  5. Strings and Regular Expressions
    Use stringr, separate(), str_extract(), and regex patterns to clean and parse text data.

  6. Data Visualization
    Create clear visual summaries using ggplot2, including line plots, bar charts, and grouped visualizations.

  7. Exploratory Data Analysis
    Build an organized EDA workflow using summaries, plots, missing-value checks, and data validation.

  8. Communicating Results
    Use storyboards, dashboards, and reporting structure to communicate data insights clearly.