install.packages("dplyr")
library(dplyr)
Embracing case_when() over Nested if-else
Introduction
Case_when can be used to evaluate a condition expression based on which a value can be assigned or decision can be made. It works like nested ifelse but shortens the code which means you’ll make fewer errors.
Let us begin with Installing and loading the required package
Snaphot of data
Name Birthdate Age
1 Payton Hobart 24-09-1993 26
2 Infinity Jackson 10-11-1994 -6
3 Astrid Sloan 17-01-1994 25
Syntax
case_when(expression_1 ~ value_1 , expression_2 ~ value_2, ……..., expression_n ~ value_n)
Here,
expression: represents the logical condition to be tested
Value: if the particular expression is true it’s output value gets assigned
Let’s try it with some examples
1) case_when with single condition
Check if a person is eligible for applying to driving license:
<- age_df %>%
age_df mutate( Eligibility =
case_when(Age >= 18 ~ 'Eligible',
TRUE ~ 'Not Eligible'
))head(age_df,10)
Name Birthdate Age Eligibility
1 Payton Hobart 24-09-1993 26 Eligible
2 Infinity Jackson 10-11-1994 -6 Not Eligible
3 Astrid Sloan 17-01-1994 25 Eligible
4 Alice Charles 03-03-1995 15 Not Eligible
5 McAfee Westbrook 22-08-1988 31 Eligible
6 Skye Leighton 04-05-1987 13 Not Eligible
7 River Barkley 08-07-1993 NA Not Eligible
8 Andrew Cashman 03-01-1992 18 Eligible
9 Rory Gilmore 16-09-1981 19 Eligible
10 Lane Kim 03-10-1973 27 Eligible
In the above example we are adding a column named “Eligibility”. If the Age >= 18 condition is met it will assign “Eligible”. On the other hand, TRUE ~ ‘Not Eligible’ acts as an ‘else’ condition, i.e. it forces case_when to give output as “Not Eligible” if none of the previous condition is TRUE.
2) case_when with multiple conditions
We will recode the numeric ‘Age’ variable into Factor variable ‘Age_group’
<- age_df %>%
age_df mutate(Age_group = case_when(
> 0 & Age < 13 ~ 'Children',
Age >= 13 & Age < 18 ~ 'Teenage',
Age >= 18 ~ 'Adult' ,
Age is.na(Age) ~ 'Age is missing',
.default = as.character(Age)))
Name | Birthdate | Age | Eligibility | Age_group |
---|---|---|---|---|
Payton Hobart | 24-09-1993 | 26 | Eligible | Adult |
Infinity Jackson | 10-11-1994 | -6 | Not Eligible | -6 |
Astrid Sloan | 17-01-1994 | 25 | Eligible | Adult |
Alice Charles | 03-03-1995 | 15 | Not Eligible | Teenage |
McAfee Westbrook | 22-08-1988 | 31 | Eligible | Adult |
Skye Leighton | 04-05-1987 | 13 | Not Eligible | Teenage |
River Barkley | 08-07-1993 | Not Eligible | Age is missing | |
Andrew Cashman | 03-01-1992 | 18 | Eligible | Adult |
Rory Gilmore | 16-09-1981 | 19 | Eligible | Adult |
Lane Kim | 03-10-1973 | 27 | Eligible | Adult |
We can handle Nulls/NAs in dataset by adding is.na() argumet and providing a value to it.
If none of the cases match and no
.default
is supplied, it will give the output as ‘NA’. Hence, we have have provide default to print Age instead of any group name.