Chapter 10 2. Titanic Example: Cleaning Names and Extracting Titles
In this section, we use the Titanic dataset to practice working with character variables.
10.1 2.1 Read the Data
The dataset is read from the Raw Data folder. We also convert Survived and Sex to factor variables during import.
10.2 2.2 Basic Cleaning
First, we remove records where Survived is missing. Then, we create a new variable called family_size.
titanic <- titanic %>%
filter(!is.na(Survived)) %>%
mutate(family_size = SibSp + Parch + 1)
head(titanic)## # A tibble: 6 x 13
## PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
## <dbl> <fct> <dbl> <chr> <fct> <dbl> <dbl> <dbl> <chr> <dbl> <chr> <chr>
## 1 1 0 3 Braund, Mr. Ow~ male 22 1 0 A/5 2~ 7.25 <NA> S
## 2 2 1 1 Cumings, Mrs. ~ fema~ 38 1 0 PC 17~ 71.3 C85 C
## 3 3 1 3 Heikkinen, Mis~ fema~ 26 0 0 STON/~ 7.92 <NA> S
## 4 4 1 1 Futrelle, Mrs.~ fema~ 35 1 0 113803 53.1 C123 S
## 5 5 0 3 Allen, Mr. Wil~ male 35 0 0 373450 8.05 <NA> S
## 6 6 0 3 Moran, Mr. Jam~ male NA 0 0 330877 8.46 <NA> Q
## # i 1 more variable: family_size <dbl>