An Introduction to R for Research

Sometimes you have a numeric variable that takes on values over a range (e.g., BMI, age, etc.) and you would like to create a binary (0/1, yes/no) categorical variable with levels corresponding to specific ranges. For example, let’s convert the variable Age (years) into “Elderly” corresponding to age at least 65 years. This can be left as a 0/1 variable, or converted to a factor.

 # As a 0/1 variable mydat$elderly  as.numeric(mydat$Age >= 65) # Examine new variable table(mydat$elderly, useNA = "ifany")
## ## 0 1 ## 375 155
 # Check range of original variable at levels of new tapply(mydat$Age, mydat$elderly, range)
## $`0` ## [1] 42 64 ## ## $`1` ## [1] 65 90
 # As a factor mydat$elderly_fac  factor(mydat$elderly, levels = 0:1, labels = c("Age < 65y", "Age 65y+")) # Examine new variable table(mydat$elderly_fac, useNA = "ifany")
 # Check range of original variable at levels of new tapply(mydat$Age, mydat$elderly_fac, range)

In tidyverse , you would use the following code. The code below uses a few new functions, summarize() and group_by() .

 # As a 0/1 variable mydat_tibble  mydat_tibble %>%  mutate(elderly = case_when(Age  65 ~ 0, Age >= 65 ~ 1)) # Examine new variable mydat_tibble %>%  count(elderly) # Check range of original variable at levels of new mydat_tibble %>%  group_by(elderly) %>%  summarize(min = min(Age), max = max(Age)) # As a factor mydat_tibble  mydat_tibble %>%  mutate(elderly = case_when(Age  65 ~ 0, Age >= 65 ~ 1)) %>%  mutate(elderly_fac = factor(elderly, levels = 0:1, labels = c("Age < 65y", "Age 65y+"))) # Examine new variable mydat_tibble %>%  count(elderly_fac) # Check range of original variable at levels of new mydat_tibble %>%  group_by(elderly_fac) %>%  summarize(min = min(Age), max = max(Age))