Chapter 10 2. Titanic Example: Cleaning Names and Extracting Titles

In this section, we use the Titanic dataset to practice working with character variables.

10.1 2.1 Read the Data

The dataset is read from the Raw Data folder. We also convert Survived and Sex to factor variables during import.

titanic <- read_csv(
  "./Raw Data/titanic.csv",
  col_types = cols(
    Survived = col_factor(),
    Sex = col_factor()
  )
)

10.2 2.2 Basic Cleaning

First, we remove records where Survived is missing. Then, we create a new variable called family_size.

titanic <- titanic %>%
  filter(!is.na(Survived)) %>%
  mutate(family_size = SibSp + Parch + 1)

head(titanic)
## # A tibble: 6 x 13
##   PassengerId Survived Pclass Name            Sex     Age SibSp Parch Ticket  Fare Cabin Embarked
##         <dbl> <fct>     <dbl> <chr>           <fct> <dbl> <dbl> <dbl> <chr>  <dbl> <chr> <chr>   
## 1           1 0             3 Braund, Mr. Ow~ male     22     1     0 A/5 2~  7.25 <NA>  S       
## 2           2 1             1 Cumings, Mrs. ~ fema~    38     1     0 PC 17~ 71.3  C85   C       
## 3           3 1             3 Heikkinen, Mis~ fema~    26     0     0 STON/~  7.92 <NA>  S       
## 4           4 1             1 Futrelle, Mrs.~ fema~    35     1     0 113803 53.1  C123  S       
## 5           5 0             3 Allen, Mr. Wil~ male     35     0     0 373450  8.05 <NA>  S       
## 6           6 0             3 Moran, Mr. Jam~ male     NA     0     0 330877  8.46 <NA>  Q       
## # i 1 more variable: family_size <dbl>