Story of the Iris Flowers

Introduction

Data analysis isn’t just about numbers—it’s about stories.

In this report, we use the famous iris dataset and the powerful {report} package in R to narrate a tale hidden within petal lengths, sepal widths, and three unique flower species. The {report} package allows us to go beyond charts and tables, turning statistical summaries into clear, human-readable insights.

We’ll walk through:

Key flower characteristics

How species differ through statistical summaries and visuals

Whether a model can “tell” us about flower type from its features

Let’s begin—petal by petal.

Meet the iris Dataset

df <- iris
head(df)

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

This dataset contains 150 flower samples from 3 iris species: setosa, versicolor, and virginica.

Structure

str(df)

'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

Human-Friendly Descriptions with glue()

for (col in names(df)[sapply(df, is.numeric)]) {
  mean_val <- round(mean(df[[col]]), 2)
  min_val <- min(df[[col]])
  max_val <- max(df[[col]])
  cat(glue("**{col}** ranges from {min_val} to {max_val}, with an average of {mean_val}.\n\n"))
}

**Sepal.Length** ranges from 4.3 to 7.9, with an average of 5.84.
**Sepal.Width** ranges from 2 to 4.4, with an average of 3.06.
**Petal.Length** ranges from 1 to 6.9, with an average of 3.76.
**Petal.Width** ranges from 0.1 to 2.5, with an average of 1.2.

Technical Summary with {report} (Variable Level)

for (col in names(df)[sapply(df, is.numeric)]) {
  cat("### Variable:", col, "\n\n")
  try(print(report(df[[col]])), silent = TRUE)
  cat("\n\n")
}

### Variable: Sepal.Length 

x: n = 150, Mean = 5.84, SD = 0.83, Median = 5.80, MAD = 1.04, range: [4.30,
7.90], Skewness = 0.31, Kurtosis = -0.55, 0% missing


### Variable: Sepal.Width 

x: n = 150, Mean = 3.06, SD = 0.44, Median = 3.00, MAD = 0.44, range: [2,
4.40], Skewness = 0.32, Kurtosis = 0.23, 0% missing


### Variable: Petal.Length 

x: n = 150, Mean = 3.76, SD = 1.77, Median = 4.35, MAD = 1.85, range: [1,
6.90], Skewness = -0.27, Kurtosis = -1.40, 0% missing


### Variable: Petal.Width 

x: n = 150, Mean = 1.20, SD = 0.76, Median = 1.30, MAD = 1.04, range: [0.10,
2.50], Skewness = -0.10, Kurtosis = -1.34, 0% missing

Model-Based Storytelling with report()

This model gives us a narrative of how petal length differs by species, written in plain English by {report}.

model <- lm(Petal.Length ~ Species, data = df)
report(model)

We fitted a linear model (estimated using OLS) to predict Petal.Length with
Species (formula: Petal.Length ~ Species). The model explains a statistically
significant and substantial proportion of variance (R2 = 0.94, F(2, 147) =
1180.16, p < .001, adj. R2 = 0.94). The model's intercept, corresponding to
Species = setosa, is at 1.46 (95% CI [1.34, 1.58], t(147) = 24.02, p < .001).
Within this model:

  - The effect of Species [versicolor] is statistically significant and positive
(beta = 2.80, 95% CI [2.63, 2.97], t(147) = 32.51, p < .001; Std. beta = 1.59,
95% CI [1.49, 1.68])
  - The effect of Species [virginica] is statistically significant and positive
(beta = 4.09, 95% CI [3.92, 4.26], t(147) = 47.52, p < .001; Std. beta = 2.32,
95% CI [2.22, 2.41])

Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald t-distribution approximation.

Boxplots of Flower Features by Species

numeric_vars <- df %>% select(where(is.numeric))
for (col in names(numeric_vars)) {
  print(
    ggplot(df, aes(x = Species, y = .data[[col]], fill = Species)) +
      geom_boxplot() +
      labs(title = paste(col, "by Species"), x = "Species", y = col) +
      theme_minimal()
  )
}

“Do different species have different petal or sepal sizes?”

Boxplots compare species visually:

Setosa has much smaller petals than others

Versicolor and Virginica differ more subtly

This reinforces the idea that species identity influences measurements, which is a key part of the story.

Sepal vs Petal Scatter Plot

ggplot(df, aes(x = Sepal.Length, y = Petal.Length, color = Species)) +
  geom_point(size = 3, alpha = 0.7) +
  labs(title = "Sepal vs Petal Length by Species") +
  theme_minimal()

“Can we visually separate the species?”

Scatter plots show relationships between variables, often revealing clusters:

Setosa forms a distinct group

Versicolor and Virginica overlap

These visuals help the reader see why machine learning might work — or struggle.

Conclusion

In this report, we turned a well-known dataset into a compelling narrative—powered by R and the {report} package.

Rather than just plotting and calculating, we let the data speak in plain language. We learned that:

Flower measurements differ significantly across species

Setosa is clearly distinct; Virginica and Versicolor subtly overlap

A statistical model could explain much of this separation

{report} allowed us to describe data not just with numbers, but with sentences

The {report} package reminds us that data storytelling is not just for analysts—it’s for readers. When we write our results like we speak, we make insights accessible, reproducible, and beautifully human.

This is how we let data tell its story.