Table Tales in R: Mastering Multi-Row Headers and Labels

 

Introduction:

Table formatting refers to the process of organizing and styling tabular data for better readability, presentation, and analysis. It involves adjusting the appearance, structure, and content of tables to convey information more effectively. Table formatting in R typically encompasses aspects like text alignment, column headers, row names, borders, etc.

Table formatting is important in R for several reasons:

  • Makes it easier to present and communicate data to others.
  • Ensures that the exported data is clean and easily usable in other software tools, as you might need to export tables in various file formats.
  • Enhances the reproducibility of your data analysis workflows.

 

In this blog, we will be focusing on adding multi-row headers and group rows via labelling. These are useful for demonstrating grouped data in an aesthetically pleasing way.

To achieve this, you can various packages like kableExtra, flextable and GT.

 

Following is a snippet of the data we will be using today:

  Passenger.ID          Name Gender Age                            Airport.Name
1        10856 Edithe Leggis Female  62                        Coldfoot Airport
2        39630  Lora Durbann Female  55       Coronel Horácio de Mattos Airport
3        37434   Halie Jewar Female  53 Wiley Post Will Rogers Memorial Airport
   Country.Name Airport.Country.Code Departure.Date Arrival.Airport
1 United States                   US      6/28/2022             CXF
2        Brazil                   BR     06-10-2022             LEC
3 United States                   US     12-05-2022             BRW
     Pilot.Name Flight.Status
1 Edithe Leggis       On Time
2  Lora Durbann       On Time
3   Halie Jewar       Delayed

It is an airline dataset and has various columns describing the passenger’s details, the country/airport from which they are flying, date of departure, and so on.

 

1. Using kableExtra package

Here, we use add_header_above() for adding headers and pack_rows() for adding row labels. In both these functions, you have to specify the name that you want to assign to the groups, and span of columns and rows for headers and labels respectively.

library(kableExtra)

kable(df) %>%
  kable_styling() %>%
  add_header_above(c("Passenger Details" = 4, "Airport Details" = 3, "Flight Details" = 4)) %>%
  pack_rows(index = c("Group 1" = 2, "Group 2" = 3))
Passenger Details
Airport Details
Flight Details
Passenger.ID Name Gender Age Airport.Name Country.Name Airport.Country.Code Departure.Date Arrival.Airport Pilot.Name Flight.Status
Group 1
10856 Edithe Leggis Female 62 Coldfoot Airport United States US 6/28/2022 CXF Edithe Leggis On Time
39630 Lora Durbann Female 55 Coronel Horácio de Mattos Airport Brazil BR 06-10-2022 LEC Lora Durbann On Time
Group 2
37434 Halie Jewar Female 53 Wiley Post Will Rogers Memorial Airport United States US 12-05-2022 BRW Halie Jewar Delayed
95434 Mattias Darrell Male 78 Aldan Airport Russian Federation RU 08-07-2022 ADH Mattias Darrell Cancelled
16341 Denys Endricci Male 33 Biju Patnaik Airport India IN 7/22/2022 BBI Denys Endricci Delayed

 

You can add as many headers as you want.

kable(df, align = "c") %>%
  kable_styling() %>%
  add_header_above(c("Passenger Details" = 4, "Airport Details" = 3, "Flight Details" = 4)) %>%
  add_header_above(c("Another Header 1" = 6, "Another Header 2" = 5)) %>%
  add_header_above(c(" " = 5, "Final Header" = 6))
Final Header
Another Header 1
Another Header 2
Passenger Details
Airport Details
Flight Details
Passenger.ID Name Gender Age Airport.Name Country.Name Airport.Country.Code Departure.Date Arrival.Airport Pilot.Name Flight.Status
10856 Edithe Leggis Female 62 Coldfoot Airport United States US 6/28/2022 CXF Edithe Leggis On Time
39630 Lora Durbann Female 55 Coronel Horácio de Mattos Airport Brazil BR 06-10-2022 LEC Lora Durbann On Time
37434 Halie Jewar Female 53 Wiley Post Will Rogers Memorial Airport United States US 12-05-2022 BRW Halie Jewar Delayed
95434 Mattias Darrell Male 78 Aldan Airport Russian Federation RU 08-07-2022 ADH Mattias Darrell Cancelled
16341 Denys Endricci Male 33 Biju Patnaik Airport India IN 7/22/2022 BBI Denys Endricci Delayed

 

2. Using flextable package

The function used here for headers, add_header_row(), is quite similar to the one used previously: you specify the column width and its value. For labeling rows, you just group the data based on the column of your choice, and then run it as a flextable.

Note: You can first sort your data based on said column for uniform arrangement of the values.
library(flextable)

df1 <- df %>% 
  arrange(Gender)

as_grouped_data(df1, groups = "Gender") %>%
  as_flextable() %>%
  add_header_row(colwidths = c(4,6), values = c("Passenger Details", "Flight Details")) %>%
  bold(i = ~ !is.na(Gender))

Passenger Details

Flight Details

Passenger.ID

Name

Age

Airport.Name

Country.Name

Airport.Country.Code

Departure.Date

Arrival.Airport

Pilot.Name

Flight.Status

Gender: Female

10,856

Edithe Leggis

62

Coldfoot Airport

United States

US

6/28/2022

CXF

Edithe Leggis

On Time

39,630

Lora Durbann

55

Coronel Horácio de Mattos Airport

Brazil

BR

06-10-2022

LEC

Lora Durbann

On Time

37,434

Halie Jewar

53

Wiley Post Will Rogers Memorial Airport

United States

US

12-05-2022

BRW

Halie Jewar

Delayed

Gender: Male

95,434

Mattias Darrell

78

Aldan Airport

Russian Federation

RU

08-07-2022

ADH

Mattias Darrell

Cancelled

16,341

Denys Endricci

33

Biju Patnaik Airport

India

IN

7/22/2022

BBI

Denys Endricci

Delayed

 

2. Using GT package

tab_spanner() is used for headers, and it’s arguments are the same as that of the previous two. For labeling rows, you can do it in two ways:

(a) specify the column to be used, and then, in tab_row_group() mention the group name, and how to identify which rows to target. Helper functions include: starts_with(), ends_with(), contains(), matches(), one_of(), num_range(), and everything()

library(gt)

df %>%
  gt(rowname_col = "Flight.Status") %>%
  tab_row_group(
    group = "On Time",
    rows = matches("On Time")
  ) %>%
  tab_row_group(
    group = "Delayed",
    rows = matches("Delayed")
  ) %>%
  tab_row_group(
    group = "Cancelled",
    rows = matches("Cancelled")
  ) %>%
  row_group_order(
    c("On Time", "Delayed", "Cancelled")
  ) %>%
  tab_spanner(
    label = "Flight Details",
    columns = 5:11
  )
Passenger.ID Name Gender Age Flight Details
Airport.Name Country.Name Airport.Country.Code Departure.Date Arrival.Airport Pilot.Name
On Time
On Time 10856 Edithe Leggis Female 62 Coldfoot Airport United States US 6/28/2022 CXF Edithe Leggis
On Time 39630 Lora Durbann Female 55 Coronel Horácio de Mattos Airport Brazil BR 06-10-2022 LEC Lora Durbann
Delayed
Delayed 37434 Halie Jewar Female 53 Wiley Post Will Rogers Memorial Airport United States US 12-05-2022 BRW Halie Jewar
Delayed 16341 Denys Endricci Male 33 Biju Patnaik Airport India IN 7/22/2022 BBI Denys Endricci
Cancelled
Cancelled 95434 Mattias Darrell Male 78 Aldan Airport Russian Federation RU 08-07-2022 ADH Mattias Darrell

 

(b) use group_by() from dplyr and then pass it as a GT table.

df %>% group_by(Flight.Status) %>% gt()
Passenger.ID Name Gender Age Airport.Name Country.Name Airport.Country.Code Departure.Date Arrival.Airport Pilot.Name
On Time
10856 Edithe Leggis Female 62 Coldfoot Airport United States US 6/28/2022 CXF Edithe Leggis
39630 Lora Durbann Female 55 Coronel Horácio de Mattos Airport Brazil BR 06-10-2022 LEC Lora Durbann
Delayed
37434 Halie Jewar Female 53 Wiley Post Will Rogers Memorial Airport United States US 12-05-2022 BRW Halie Jewar
16341 Denys Endricci Male 33 Biju Patnaik Airport India IN 7/22/2022 BBI Denys Endricci
Cancelled
95434 Mattias Darrell Male 78 Aldan Airport Russian Federation RU 08-07-2022 ADH Mattias Darrell

 

Conclusion:

For adding headers above the column names, all three packages provide more or less similar commands: you have to specific the column spans and the label that you want to assign to that place. For grouping rows, kableExtra follows the same pattern previously mentioned for headers. Hence, if your data is large and/or is not sorted based on the column that you want to group rows for, specifying row positions for grouping will prove to be a tedious task. On the other hand, flextable and GT provide more flexibility for this same task. Ultimately, the decision rests with you.

 

Table formatting is crucial in various contexts, such as data analysis, data visualization, report generation, and publication. Well-formatted tables enhance the clarity of data presentation, help readers or analysts quickly grasp key information, and contribute to effective data communication. There are a lot more things that you can do to format your table, but that is a blog for some other time!