Chapter 34 4. Clean and reshape population data
The population data is currently in a wide format, with age groups stored as separate columns. For plotting and analysis, it is easier to work with tidy long-format data.
34.1 4.1 Convert population data to long format
pop_long <- pop %>%
select(-any_of(c("X1", "Total"))) %>%
pivot_longer(
cols = -any_of(c("Year", "Gender", "Health Service Delivery Area")),
names_to = "Age",
values_to = "Population"
) %>%
mutate(
Age = case_when(
Age == "04-Jan" ~ "01-04",
Age == "09-May" ~ "05-09",
Age == "14-Oct" ~ "10-14",
TRUE ~ Age
)
)
pop_long## # A tibble: 1,071 x 5
## `Health Service Delivery Area` Year Gender Age Population
## <chr> <dbl> <chr> <chr> <dbl>
## 1 British Columbia 2016 M ...1 0
## 2 British Columbia 2016 M <1 22997
## 3 British Columbia 2016 M 01-04 93211
## 4 British Columbia 2016 M 05-09 121341
## 5 British Columbia 2016 M 10-14 119586
## 6 British Columbia 2016 M 15-19 140451
## 7 British Columbia 2016 M 20-24 170968
## 8 British Columbia 2016 M 25-29 159609
## 9 British Columbia 2016 M 30-34 162416
## 10 British Columbia 2016 M 35-39 155936
## # i 1,061 more rows
34.2 4.2 Rename columns and remove unnecessary fields
The column name Health Service Delivery Area is long and contains spaces. We will rename it to HSDA. We will also remove the Year column because all records are from 2016.
## # A tibble: 1,071 x 4
## HSDA Gender Age Population
## <chr> <chr> <chr> <dbl>
## 1 British Columbia M ...1 0
## 2 British Columbia M <1 22997
## 3 British Columbia M 01-04 93211
## 4 British Columbia M 05-09 121341
## 5 British Columbia M 10-14 119586
## 6 British Columbia M 15-19 140451
## 7 British Columbia M 20-24 170968
## 8 British Columbia M 25-29 159609
## 9 British Columbia M 30-34 162416
## 10 British Columbia M 35-39 155936
## # i 1,061 more rows
34.3 4.3 Keep total population only
The Gender column includes T, which represents total population. Because we are interested in total population by HSDA and age group for this example, we will keep only these records.
## # A tibble: 357 x 4
## HSDA Gender Age Population
## <chr> <chr> <chr> <dbl>
## 1 British Columbia T ...1 0
## 2 British Columbia T <1 44757
## 3 British Columbia T 01-04 181511
## 4 British Columbia T 05-09 234118
## 5 British Columbia T 10-14 232010
## 6 British Columbia T 15-19 272677
## 7 British Columbia T 20-24 327129
## 8 British Columbia T 25-29 318998
## 9 British Columbia T 30-34 329242
## 10 British Columbia T 35-39 313564
## # i 347 more rows