Chapter 47 17. Practice activity: Join population and correspondence data
The population and correspondence datasets do not have a perfectly clean join key. You may need to clean text or numeric identifiers before joining.
Possible approaches:
- clean and compare HSDA names in
popandcorr - compare code-based fields such as
X1andhruid2017 - reshape the population data before joining
- check how many records match and how many do not
# Example starting point only.
# You will need to inspect the exact values before deciding on the best join.
pop_join_ready <- pop_total %>%
mutate(HSDA_clean = str_to_lower(HSDA) %>% str_squish())
corr_join_ready <- corr %>%
mutate(hrname_clean = str_to_lower(hrname_english) %>% str_squish())
pop_corr_test <- left_join(
pop_join_ready,
corr_join_ready,
by = c("HSDA_clean" = "hrname_clean")
)
pop_corr_test