Embracing case_when() over Nested if-else

Introduction

Case_when can be used to evaluate a condition expression based on which a value can be assigned or decision can be made. It works like nested ifelse but shortens the code which means you’ll make fewer errors.

Let us begin with Installing and loading the required package

install.packages("dplyr")
library(dplyr)

Snaphot of data

              Name  Birthdate Age
1    Payton Hobart 24-09-1993  26
2 Infinity Jackson 10-11-1994  -6
3     Astrid Sloan 17-01-1994  25

Syntax

case_when(expression_1 ~ value_1 , expression_2 ~ value_2, ……..., expression_n ~ value_n)
Here,
expression: represents the logical condition to be tested

Value: if the particular expression is true it’s output value gets assigned

Let’s try it with some examples

1) case_when with single condition

Check if a person is eligible for applying to driving license:

age_df <- age_df %>% 
          mutate( Eligibility = 
                    case_when(Age >= 18 ~ 'Eligible',
                              TRUE ~ 'Not Eligible'
                              ))
head(age_df,10)
               Name  Birthdate Age  Eligibility
1     Payton Hobart 24-09-1993  26     Eligible
2  Infinity Jackson 10-11-1994  -6 Not Eligible
3      Astrid Sloan 17-01-1994  25     Eligible
4     Alice Charles 03-03-1995  15 Not Eligible
5  McAfee Westbrook 22-08-1988  31     Eligible
6     Skye Leighton 04-05-1987  13 Not Eligible
7     River Barkley 08-07-1993  NA Not Eligible
8    Andrew Cashman 03-01-1992  18     Eligible
9      Rory Gilmore 16-09-1981  19     Eligible
10         Lane Kim 03-10-1973  27     Eligible

In the above example we are adding a column named “Eligibility”. If the Age >= 18 condition is met it will assign “Eligible”. On the other hand, TRUE ~ ‘Not Eligible’ acts as an ‘else’ condition, i.e. it forces case_when to give output as “Not Eligible” if none of the previous condition is TRUE.

2) case_when with multiple conditions

We will recode the numeric ‘Age’ variable into Factor variable ‘Age_group’

age_df <- age_df %>% 
  mutate(Age_group = case_when(
      Age > 0  & Age < 13 ~ 'Children',
      Age >= 13 & Age < 18 ~ 'Teenage',
      Age >= 18 ~ 'Adult' ,
      is.na(Age) ~ 'Age is missing',
      .default = as.character(Age))) 
Name Birthdate Age Eligibility Age_group
Payton Hobart 24-09-1993 26 Eligible Adult
Infinity Jackson 10-11-1994 -6 Not Eligible -6
Astrid Sloan 17-01-1994 25 Eligible Adult
Alice Charles 03-03-1995 15 Not Eligible Teenage
McAfee Westbrook 22-08-1988 31 Eligible Adult
Skye Leighton 04-05-1987 13 Not Eligible Teenage
River Barkley 08-07-1993 Not Eligible Age is missing
Andrew Cashman 03-01-1992 18 Eligible Adult
Rory Gilmore 16-09-1981 19 Eligible Adult
Lane Kim 03-10-1973 27 Eligible Adult
  • We can handle Nulls/NAs in dataset by adding is.na() argumet and providing a value to it.

  • If none of the cases match and no .default is supplied, it will give the output as ‘NA’. Hence, we have have provide default to print Age instead of any group name.