New Business and financial analysts are using R everyday. Most of these new R user are coming from a non-programming background and their tool of choice is Excel. So let’s begin with, “How to do a VLOOKUP in R ?”
VLOOKUP is an Excel function to lookup and retrieve data from a specific column in table. “tidyquant” package in R is the solution for it.
install.packages("tidyquant")
install.packages("dplyr")
library(tidyquant)
library(dplyr)
The most popular Excel Reference Function, VLOOKUP, is now in R as VLOOKUP() in tidyquant package.
First dataset of 5 customers with total numbers of calls done in each week for a month.
df1 <-data.frame(CustID= c(rep(1001,4),rep(1002,4),rep(1003,4),rep(1004,4),rep(1005,4)),
Week = c(1,2,3,4),
Calls = c(56,49,140,182,70,7,7,84,63,14,63,133,63,63,49,168,56,35,147,56))
Second dataset gives general Information of these 5 Customers.
df2 <- data.frame(CustID = c(1001,1002,1003,1004,1005),
Age = c(29,22,29,33,32),
Gender = c("F","M","F","F","M"))
Here we just VLOOKUP a single value. Mapping Age of a customer whose ID is 1001.
VLOOKUP(1001, df2, CustID, Age)
## [1] 29
So like Excel,
1001 : Value to map
df2 : Data form which the mapping is done
CustID : Key Value for mapping
Age : The Column that needs to be VLOOKUP
Now let’s vlookup complete column in a dataset. We can do this by using the mutate() function from dplyr. This works because VLOOKUP() is vectorized.
As mutate() function is used to create new column from a data set. Here we are creating “Age” column.
df1 %>%
mutate(Age = VLOOKUP(CustID, df2, CustID, Age))
## CustID Week Calls Age
## 1 1001 1 56 29
## 2 1001 2 49 29
## 3 1001 3 140 29
## 4 1001 4 182 29
## 5 1002 1 70 22
## 6 1002 2 7 22
## 7 1002 3 7 22
## 8 1002 4 84 22
## 9 1003 1 63 29
## 10 1003 2 14 29
## 11 1003 3 63 29
## 12 1003 4 133 29
## 13 1004 1 63 33
## 14 1004 2 63 33
## 15 1004 3 49 33
## 16 1004 4 168 33
## 17 1005 1 56 32
## 18 1005 2 35 32
## 19 1005 3 147 32
## 20 1005 4 56 32
df1 %>%
mutate(Age = VLOOKUP(CustID, df2, CustID, Age)) %>%
mutate(Gender = VLOOKUP(CustID, df2, CustID, Gender))
## CustID Week Calls Age Gender
## 1 1001 1 56 29 F
## 2 1001 2 49 29 F
## 3 1001 3 140 29 F
## 4 1001 4 182 29 F
## 5 1002 1 70 22 M
## 6 1002 2 7 22 M
## 7 1002 3 7 22 M
## 8 1002 4 84 22 M
## 9 1003 1 63 29 F
## 10 1003 2 14 29 F
## 11 1003 3 63 29 F
## 12 1003 4 133 29 F
## 13 1004 1 63 33 F
## 14 1004 2 63 33 F
## 15 1004 3 49 33 F
## 16 1004 4 168 33 F
## 17 1005 1 56 32 M
## 18 1005 2 35 32 M
## 19 1005 3 147 32 M
## 20 1005 4 56 32 M