basictemplate

Introduction

Anyone who has worked with SAS is probably familiar with the PROC CONTENTS command — a quick way to view a dataset’s structure and metadata. In contrast, the typical approach in R or Python involves using multiple functions like head(), dim(), class(), or summary() to understand the data. The vtable package in R brings a similar convenience by offering a single, unified function to inspect and summarize datasets — much like PROC CONTENTS in SAS.

Let us begin with installing and loading the required package

# install.packages("vtable")
library(vtable)

A glance at the dataset

Let’s look at the usage of ‘vtable’ package

1. Getting Variable Names, Types, and Value Summaries

Here, we will first be updating the char variables to factor to see the level information in vtable

insurance_df <- insurance_df %>% mutate(across(where(is.character), as.factor))
vtable(insurance_df)

insurance_df
Name	Class	Values
age	integer	Num: 18 to 64
gender	factor	'female' 'male'
bmi	numeric	Num: 15.96 to 53.13
children	integer	Num: 0 to 5
smoker	factor	'no' 'yes'
region	factor	'northeast' 'northwest' 'southeast' 'southwest'
charges	numeric	Num: 1062.385 to 63770.428

2. Add description to understand the variables better

my_labels <- c("Insured person's age", "Gender","BMI", "Number of dependents", "Smoker status", "Residential region", "Insurance cost" )

vtable(insurance_df, 
       labels = my_labels, # to add description column to understand the variables better 
       data.title = "Insurance data", # to add title
       summ = c("mean(x)"," countNA(x)") # to add summary statistics column
       )

Insurance data
Name	Class	Label	Values	Summary
age	integer	Insured person's age	Num: 18 to 64	mean: 39.207, countNA: 0
gender	factor	Gender	'female' 'male'	countNA: 0
bmi	numeric	BMI	Num: 15.96 to 53.13	mean: 30.663, countNA: 0
children	integer	Number of dependents	Num: 0 to 5	mean: 1.095, countNA: 0
smoker	factor	Smoker status	'no' 'yes'	countNA: 0
region	factor	Residential region	'northeast' 'northwest' 'southeast' 'southwest'	countNA: 0
charges	numeric	Insurance cost	Num: 1062.385 to 63770.428	mean: 13293.678, countNA: 0

3. Creating Balance Tables in R with sumtable()

Balance tables are used to check whether groups (e.g., smokers vs. non-smokers) are comparable in terms of key covariates like age, BMI, and number of children. This is especially important before modeling or in observational studies where group equivalence is critical.

# Basic Variable summary Table
sumtable(insurance_df)

Summary Statistics
Variable	N	Mean	Std. Dev.	Min	Pctl. 25	Pctl. 75	Max
age	1338	39	14	18	27	51	64
gender	1338
... female	662	49%
... male	676	51%
bmi	1338	31	6.1	16	26	35	53
children	1338	1.1	1.2	0	0	2	5
smoker	1338
... no	1064	80%
... yes	274	20%
region	1338
... northeast	324	24%
... northwest	324	24%
... southeast	365	27%
... southwest	325	24%
charges	1338	13294	12121	1062	4747	16747	63770

# Group-wise summary (e.g., by smoker status)
sumtable(insurance_df, group = "smoker", group.test = T)

Summary Statistics
smoker	no			yes
Variable	N	Mean	SD	N	Mean	SD	Test
age	1064	39	14	274	39	14	F=0.837
gender	1064			274			X2=7.393^***
... female	547	51%		115	42%
... male	517	49%		159	58%
bmi	1064	31	6	274	31	6.3	F=0.019
children	1064	1.1	1.2	274	1.1	1.2	F=0.079
region	1064			274			X2=7.157^*
... northeast	257	24%		67	24%
... northwest	266	25%		58	21%
... southeast	274	26%		91	33%
... southwest	267	25%		58	21%
charges	1064	8464	6046	274	32050	11542	F=2153.091^***
Statistical significance markers: * p<0.1; p<0.05; * p<0.01

# sumtable(insurance_df, group = "smoker", group.test = T, summ=c('notNA(x)', 'mean(x)','median(x)','propNA(x)'))

group.test = TRUE, sumtable() automatically performs statistical tests to compare the variables between the groups.

Numeric variables: t-test to compare means between two groups
Factor variables: Chi-squared test to compare proportions across categories

Benefits Using vtable

Beautiful formatting out-of-the-box (great for articles, reports).
Handles mixed variable types (numeric, factors) automatically.
Quick grouped summaries (like balance tables).
Supports export to Word/HTML, great for sharing.
Minimal code, maximum readability.

Drawback:

Limited customization for statistical test options (e.g., no non-parametric tests)
For deeper EDA, you may still need dplyr, ggplot2, or summarytools

Summary statistics made easy with vtable package in R