Sometimes you have a numeric variable that takes on values over a range (e.g., BMI, age, etc.) and you would like to create a binary (0/1, yes/no) categorical variable with levels corresponding to specific ranges. For example, let’s convert the variable Age (years) into “Elderly” corresponding to age at least 65 years. This can be left as a 0/1 variable, or converted to a factor.
# As a 0/1 variable mydat$elderly as.numeric(mydat$Age >= 65) # Examine new variable table(mydat$elderly, useNA = "ifany")
## ## 0 1 ## 375 155
# Check range of original variable at levels of new tapply(mydat$Age, mydat$elderly, range)
## $`0` ## [1] 42 64 ## ## $`1` ## [1] 65 90
# As a factor mydat$elderly_fac factor(mydat$elderly, levels = 0:1, labels = c("Age < 65y", "Age 65y+")) # Examine new variable table(mydat$elderly_fac, useNA = "ifany")
# Check range of original variable at levels of new tapply(mydat$Age, mydat$elderly_fac, range)
In tidyverse , you would use the following code. The code below uses a few new functions, summarize() and group_by() .
# As a 0/1 variable mydat_tibble mydat_tibble %>% mutate(elderly = case_when(Age 65 ~ 0, Age >= 65 ~ 1)) # Examine new variable mydat_tibble %>% count(elderly) # Check range of original variable at levels of new mydat_tibble %>% group_by(elderly) %>% summarize(min = min(Age), max = max(Age)) # As a factor mydat_tibble mydat_tibble %>% mutate(elderly = case_when(Age 65 ~ 0, Age >= 65 ~ 1)) %>% mutate(elderly_fac = factor(elderly, levels = 0:1, labels = c("Age < 65y", "Age 65y+"))) # Examine new variable mydat_tibble %>% count(elderly_fac) # Check range of original variable at levels of new mydat_tibble %>% group_by(elderly_fac) %>% summarize(min = min(Age), max = max(Age))