Chapter 12 4. Extract Titles Using Regex

Passenger titles such as Mr, Mrs, Miss, and Master are stored inside the First_name column. We can extract them using str_extract().

titanic <- titanic %>%
  mutate(Title = str_extract(First_name, "^[^.]+"))

head(titanic %>% select(First_name, Title))
## # A tibble: 6 x 2
##   First_name                                 Title
##   <chr>                                      <chr>
## 1 Mr. Owen Harris                            Mr   
## 2 Mrs. John Bradley (Florence Briggs Thayer) Mrs  
## 3 Miss. Laina                                Miss 
## 4 Mrs. Jacques Heath (Lily May Peel)         Mrs  
## 5 Mr. William Henry                          Mr   
## 6 Mr. James                                  Mr

12.1 4.1 Regex Explanation

The pattern ^[^.]+ means:

  • ^ starts matching at the beginning of the string.
  • [^.] means any character except a period.
  • + means one or more of the previous character pattern.

Together, this extracts all characters from the start of the string until the first period.