install.packages("openxlsx")
library(openxlsx)
R Tips: Read and Write multiple excel sheets in one go
Introduction
In data science, data is often spread across multiple sheets within an Excel workbook, making it crucial to efficiently access and combine these sheets for analysis. The “openxlsx” library in R provides an easy and flexible way to read and write Excel files. By using openxlsx, data scientists can streamline the process of reading multiple sheets from an Excel file and exporting results to Excel, enabling more time for data interpretation and analysis
Let us begin with installing and loading the required package
Defining the file path and fetching sheet names
<- "dummy_data_workbook.xlsx"
file_path
# Get available sheet names in the workbook
<- getSheetNames(file_path)
sheet_names sheet_names
[1] "Demographics" "Sales" "Monthly Sales"
We will first look at importing multiple sheets.
Method 1: Selecting and loading specific sheets from a workbook into separate dataframes
Focus on importing only the sheets you need for a targeted and efficient workflow
# Optionally, filter the sheets you want to import
<- c("Sales", "Monthly Sales")
desired_sheets <- intersect(sheet_names, desired_sheets)
valid_sheets
# # replace "valid_sheets" with "sheet_names" to import all sheets
for (sheet in valid_sheets) {
<- make.names(sheet) # Create a valid R variable name using make.names
valid_name print(valid_name)
assign(valid_name, read.xlsx(file_path, sheet = sheet)) # To create a data frame with the name of the sheet
}
[1] "Sales"
[1] "Monthly.Sales"
The data frames are now created in your environment with names matching the sheet names.
Note: We have used “make.names(sheet)” to ensure sheet names are valid R data frame names. “Monthly Sales” is changed to “Monthly.Sales”
Method 2: With Environment Management
To avoid cluttering your global environment, consider using a list to store the data frames:
# Read each valid sheet into a list of data frames and rename the list elements
<- lapply(sheet_names, function(sheet) { #Here we are importing all the sheets rather than specific sheets
data_list read.xlsx(file_path, sheet = sheet)
})
names(data_list) <- make.names(sheet_names)
# View the list names (now valid R variable names)
print(names(data_list))
[1] "Demographics" "Sales" "Monthly.Sales"
Save dataframes in multiple sheets of same workbook(xlsx)
# Create a workbook and add a worksheet
<- createWorkbook()
wb addWorksheet(wb, "Data")
1]]
data_list[[# writeData(wb, "sheet_name", df_name)
writeData(wb, "Demographics", data_list)
writeData(wb, "Correlation", clinical_df)
# Save the workbook
saveWorkbook(wb, "MyData.xlsx", overwrite = TRUE)
The data frames are now stored as separate sheets in workbook. this will be stored in current working directory.
Conclusion:
Using openxlsx in R allows easy reading of multiple sheets, creating valid data frame names with make.names(), and exporting them back to Excel using writeData(). This makes handling large workbooks efficient. While readxl is faster for simple data imports, it only supports reading files without writing.
Key Advantages of openxlsx
- Read and write Excel files
- Custom formatting options: apply formatting such as cell styles, fonts, colors, borders, and more when exporting to Excel.
- Flexible sheet handling: Can easily read, write, rename, and delete sheets within a workbook.