Chapter 47 17. Practice activity: Join population and correspondence data

The population and correspondence datasets do not have a perfectly clean join key. You may need to clean text or numeric identifiers before joining.

Possible approaches:

  • clean and compare HSDA names in pop and corr
  • compare code-based fields such as X1 and hruid2017
  • reshape the population data before joining
  • check how many records match and how many do not
# Example starting point only.
# You will need to inspect the exact values before deciding on the best join.

pop_join_ready <- pop_total %>%
  mutate(HSDA_clean = str_to_lower(HSDA) %>% str_squish())

corr_join_ready <- corr %>%
  mutate(hrname_clean = str_to_lower(hrname_english) %>% str_squish())

pop_corr_test <- left_join(
  pop_join_ready,
  corr_join_ready,
  by = c("HSDA_clean" = "hrname_clean")
)

pop_corr_test