R Tips: Read and Write multiple excel sheets in one go

Introduction

In data science, data is often spread across multiple sheets within an Excel workbook, making it crucial to efficiently access and combine these sheets for analysis. The “openxlsx” library in R provides an easy and flexible way to read and write Excel files. By using openxlsx, data scientists can streamline the process of reading multiple sheets from an Excel file and exporting results to Excel, enabling more time for data interpretation and analysis

Let us begin with installing and loading the required package

install.packages("openxlsx")
library(openxlsx)

Defining the file path and fetching sheet names

file_path <- "dummy_data_workbook.xlsx"

# Get available sheet names in the workbook
sheet_names <- getSheetNames(file_path)
sheet_names
[1] "Demographics"  "Sales"         "Monthly Sales"

We will first look at importing multiple sheets.

Method 1: Selecting and loading specific sheets from a workbook into separate dataframes

Focus on importing only the sheets you need for a targeted and efficient workflow

# Optionally, filter the sheets you want to import
desired_sheets <- c("Sales", "Monthly Sales")
valid_sheets <- intersect(sheet_names, desired_sheets)

# # replace "valid_sheets" with "sheet_names" to import all sheets
for (sheet in valid_sheets) {
  
  valid_name <- make.names(sheet)  # Create a valid R variable name using make.names
  print(valid_name)
  assign(valid_name, read.xlsx(file_path, sheet = sheet)) #  To create a data frame with the name of the sheet
}
[1] "Sales"
[1] "Monthly.Sales"

The data frames are now created in your environment with names matching the sheet names.
Note: We have used “make.names(sheet)” to ensure sheet names are valid R data frame names. “Monthly Sales” is changed to “Monthly.Sales”

Method 2: With Environment Management

To avoid cluttering your global environment, consider using a list to store the data frames:

# Read each valid sheet into a list of data frames and rename the list elements
data_list <- lapply(sheet_names, function(sheet) {  #Here we are importing all the sheets rather than specific sheets
  read.xlsx(file_path, sheet = sheet)
})

names(data_list) <- make.names(sheet_names)

# View the list names (now valid R variable names)
print(names(data_list))
[1] "Demographics"  "Sales"         "Monthly.Sales"

Save dataframes in multiple sheets of same workbook(xlsx)

# Create a workbook and add a worksheet
wb <- createWorkbook()
addWorksheet(wb, "Data")
data_list[[1]]
# writeData(wb, "sheet_name", df_name)
writeData(wb, "Demographics", data_list)

writeData(wb, "Correlation", clinical_df)

# Save the workbook
saveWorkbook(wb, "MyData.xlsx", overwrite = TRUE)

The data frames are now stored as separate sheets in workbook. this will be stored in current working directory.

Conclusion:

Using openxlsx in R allows easy reading of multiple sheets, creating valid data frame names with make.names(), and exporting them back to Excel using writeData(). This makes handling large workbooks efficient. While readxl is faster for simple data imports, it only supports reading files without writing.

Key Advantages of openxlsx

  • Read and write Excel files
  • Custom formatting options: apply formatting such as cell styles, fonts, colors, borders, and more when exporting to Excel.
  • Flexible sheet handling: Can easily read, write, rename, and delete sheets within a workbook.