Data analysis isn’t just about numbers—it’s about stories.
In this report, we use the famous iris dataset and the powerful {report} package in R to narrate a tale hidden within petal lengths, sepal widths, and three unique flower species. The {report} package allows us to go beyond charts and tables, turning statistical summaries into clear, human-readable insights.
We’ll walk through:
Key flower characteristics
How species differ through statistical summaries and visuals
Whether a model can “tell” us about flower type from its features
for (col innames(df)[sapply(df, is.numeric)]) { mean_val <-round(mean(df[[col]]), 2) min_val <-min(df[[col]]) max_val <-max(df[[col]])cat(glue("**{col}** ranges from {min_val} to {max_val}, with an average of {mean_val}.\n\n"))}
**Sepal.Length** ranges from 4.3 to 7.9, with an average of 5.84.
**Sepal.Width** ranges from 2 to 4.4, with an average of 3.06.
**Petal.Length** ranges from 1 to 6.9, with an average of 3.76.
**Petal.Width** ranges from 0.1 to 2.5, with an average of 1.2.
Technical Summary with {report} (Variable Level)
for (col innames(df)[sapply(df, is.numeric)]) {cat("### Variable:", col, "\n\n")try(print(report(df[[col]])), silent =TRUE)cat("\n\n")}
### Variable: Sepal.Length
x: n = 150, Mean = 5.84, SD = 0.83, Median = 5.80, MAD = 1.04, range: [4.30,
7.90], Skewness = 0.31, Kurtosis = -0.55, 0% missing
### Variable: Sepal.Width
x: n = 150, Mean = 3.06, SD = 0.44, Median = 3.00, MAD = 0.44, range: [2,
4.40], Skewness = 0.32, Kurtosis = 0.23, 0% missing
### Variable: Petal.Length
x: n = 150, Mean = 3.76, SD = 1.77, Median = 4.35, MAD = 1.85, range: [1,
6.90], Skewness = -0.27, Kurtosis = -1.40, 0% missing
### Variable: Petal.Width
x: n = 150, Mean = 1.20, SD = 0.76, Median = 1.30, MAD = 1.04, range: [0.10,
2.50], Skewness = -0.10, Kurtosis = -1.34, 0% missing
Model-Based Storytelling with report()
This model gives us a narrative of how petal length differs by species, written in plain English by {report}.
model <-lm(Petal.Length ~ Species, data = df)report(model)
We fitted a linear model (estimated using OLS) to predict Petal.Length with
Species (formula: Petal.Length ~ Species). The model explains a statistically
significant and substantial proportion of variance (R2 = 0.94, F(2, 147) =
1180.16, p < .001, adj. R2 = 0.94). The model's intercept, corresponding to
Species = setosa, is at 1.46 (95% CI [1.34, 1.58], t(147) = 24.02, p < .001).
Within this model:
- The effect of Species [versicolor] is statistically significant and positive
(beta = 2.80, 95% CI [2.63, 2.97], t(147) = 32.51, p < .001; Std. beta = 1.59,
95% CI [1.49, 1.68])
- The effect of Species [virginica] is statistically significant and positive
(beta = 4.09, 95% CI [3.92, 4.26], t(147) = 47.52, p < .001; Std. beta = 2.32,
95% CI [2.22, 2.41])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald t-distribution approximation.
Boxplots of Flower Features by Species
numeric_vars <- df %>%select(where(is.numeric))for (col innames(numeric_vars)) {print(ggplot(df, aes(x = Species, y = .data[[col]], fill = Species)) +geom_boxplot() +labs(title =paste(col, "by Species"), x ="Species", y = col) +theme_minimal() )}
“Do different species have different petal or sepal sizes?”
Boxplots compare species visually:
Setosa has much smaller petals than others
Versicolor and Virginica differ more subtly
This reinforces the idea that species identity influences measurements, which is a key part of the story.
Sepal vs Petal Scatter Plot
ggplot(df, aes(x = Sepal.Length, y = Petal.Length, color = Species)) +geom_point(size =3, alpha =0.7) +labs(title ="Sepal vs Petal Length by Species") +theme_minimal()
“Can we visually separate the species?”
Scatter plots show relationships between variables, often revealing clusters:
Setosa forms a distinct group
Versicolor and Virginica overlap
These visuals help the reader see why machine learning might work — or struggle.
Conclusion
In this report, we turned a well-known dataset into a compelling narrative—powered by R and the {report} package.
Rather than just plotting and calculating, we let the data speak in plain language. We learned that:
Flower measurements differ significantly across species
Setosa is clearly distinct; Virginica and Versicolor subtly overlap
A statistical model could explain much of this separation
{report} allowed us to describe data not just with numbers, but with sentences
The {report} package reminds us that data storytelling is not just for analysts—it’s for readers. When we write our results like we speak, we make insights accessible, reproducible, and beautifully human.