Chapter 36 6. Identify possible join keys

A join requires one or more common variables between datasets. Before joining, always check whether the keys are clean and whether the relationship is one-to-one, one-to-many, or many-to-many.

map(data_list, names)
## $mort
##  [1] "ID"                "Sex"               "Cause_of_death"    "dbuid2016"        
##  [5] "Location_of_death" "Marital_status"    "Postalcode"        "B_year"           
##  [9] "B_month"           "B_day"             "D_year"            "D_month"          
## [13] "D_day"            
## 
## $pop
##  [1] "...1"                         "Health Service Delivery Area" "Year"                        
##  [4] "Gender"                       "<1"                           "04-Jan"                      
##  [7] "09-May"                       "14-Oct"                       "15-19"                       
## [10] "20-24"                        "25-29"                        "30-34"                       
## [13] "35-39"                        "40-44"                        "45-49"                       
## [16] "50-54"                        "55-59"                        "60-64"                       
## [19] "65-69"                        "70-74"                        "75-79"                       
## [22] "80-84"                        "85-89"                        "90+"                         
## [25] "Total"                       
## 
## $corr
## [1] "dbuid2016"      "csduid2016"     "hruid2017"      "hrname_english" "hrname_french" 
## [6] "dbpop2016"     
## 
## $env
## [1] "POSTALCODE12" "WTHNRC12_01"  "WTHNRC12_02"  "WTHNRC12_03"  "WTHNRC12_04"  "WTHNRC12_05" 
## [7] "WTHNRC12_06"  "WTHNRC12_07"

Possible join paths:

  • mort can be joined to corr using dbuid2016.
  • pop can potentially be joined to corr using health region or health service delivery area variables, but the names and codes need checking first.
  • env contains postal code data, but the link to mortality records needs postal code cleaning and validation.