case_when

Embracing case_when() over Nested if-else

Introduction

Case_when can be used to evaluate a condition expression based on which a value can be assigned or decision can be made. It works like nested ifelse but shortens the code which means you’ll make fewer errors.

Let us begin with Installing and loading the required package

install.packages("dplyr")
library(dplyr)

Snaphot of data

              Name  Birthdate Age
1    Payton Hobart 24-09-1993  26
2 Infinity Jackson 10-11-1994  -6
3     Astrid Sloan 17-01-1994  25

Syntax

case_when(expression_1 ~ value_1 , expression_2 ~ value_2, ……..., expression_n ~ value_n)
Here,
expression: represents the logical condition to be tested

Value: if the particular expression is true it’s output value gets assigned

Let’s try it with some examples

1) case_when with single condition

Check if a person is eligible for applying to driving license:

age_df <- age_df %>% 
          mutate( Eligibility = 
                    case_when(Age >= 18 ~ 'Eligible',
                              TRUE ~ 'Not Eligible'
                              ))
head(age_df,10)

               Name  Birthdate Age  Eligibility
1     Payton Hobart 24-09-1993  26     Eligible
2  Infinity Jackson 10-11-1994  -6 Not Eligible
3      Astrid Sloan 17-01-1994  25     Eligible
4     Alice Charles 03-03-1995  15 Not Eligible
5  McAfee Westbrook 22-08-1988  31     Eligible
6     Skye Leighton 04-05-1987  13 Not Eligible
7     River Barkley 08-07-1993  NA Not Eligible
8    Andrew Cashman 03-01-1992  18     Eligible
9      Rory Gilmore 16-09-1981  19     Eligible
10         Lane Kim 03-10-1973  27     Eligible

In the above example we are adding a column named “Eligibility”. If the Age >= 18 condition is met it will assign “Eligible”. On the other hand, TRUE ~ ‘Not Eligible’ acts as an ‘else’ condition, i.e. it forces case_when to give output as “Not Eligible” if none of the previous condition is TRUE.

2) case_when with multiple conditions

We will recode the numeric ‘Age’ variable into Factor variable ‘Age_group’

age_df <- age_df %>% 
  mutate(Age_group = case_when(
      Age > 0  & Age < 13 ~ 'Children',
      Age >= 13 & Age < 18 ~ 'Teenage',
      Age >= 18 ~ 'Adult' ,
      is.na(Age) ~ 'Age is missing',
      .default = as.character(Age)))

Name	Birthdate	Age	Eligibility	Age_group
Payton Hobart	24-09-1993	26	Eligible	Adult
Infinity Jackson	10-11-1994	-6	Not Eligible	-6
Astrid Sloan	17-01-1994	25	Eligible	Adult
Alice Charles	03-03-1995	15	Not Eligible	Teenage
McAfee Westbrook	22-08-1988	31	Eligible	Adult
Skye Leighton	04-05-1987	13	Not Eligible	Teenage
River Barkley	08-07-1993		Not Eligible	Age is missing
Andrew Cashman	03-01-1992	18	Eligible	Adult
Rory Gilmore	16-09-1981	19	Eligible	Adult
Lane Kim	03-10-1973	27	Eligible	Adult

We can handle Nulls/NAs in dataset by adding is.na() argumet and providing a value to it.
If none of the cases match and no .default is supplied, it will give the output as ‘NA’. Hence, we have have provide default to print Age instead of any group name.