A Venn diagram developed by John Venn in 1880’s, is a widely-used diagram style that shows the logical relation between sets. A Venn diagram is an illustration that uses circles to show the relationships among things or finite groups of things. Circles that overlap have a commonality while circles that do not overlap do not share those traits.

Venn Diagram in R

Let us consider an example

Consider a hypothetical example: A clinical trial is designed to evaluate the most suitable treatment amongst patients infected with pulmonary tuberculosis. Two new drugs (say B and C) have been manufactured and researchers are interested to estimate the effectiveness of these drugs B and C in comparison to an existing drug (say drug A). Additionally, the researcher is also interested to know the combined effect of a combination of two drugs (AB, BC and AC) on patients suffering from pulmonary tuberculosis. Suppose a total of 24 patients are recruited in the study who have been infected with pulmonary tuberculosis and are randomly allocated individual drugs A,B,C and combinations of drugs AB,BC,AC and ABC.

We can plot this example using a Venn Diagram for visualization

Let us randomly assign patients to treatments

subject<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24)
trt<-c("C","AB","A","BC","AC","ABC","B","AB","A","A","BC","AB","AB","AC","B","B","BC","ABC","C","BC","AC","ABC","AC","B")
df<-data.frame(subject,trt)
head(df)
##   subject trt
## 1       1   C
## 2       2  AB
## 3       3   A
## 4       4  BC
## 5       5  AC
## 6       6 ABC

In order to use the function of Venn Diagram, we need to install and load the Venn Diagram package in R.

install.packages("VennDiagram")
library(VennDiagram)

Let us visualize the entire data set of 24 subjects using a single venn diagram.

grid.newpage()
draw.single.venn(area = length(df$trt))

## (polygon[GRID.polygon.1], polygon[GRID.polygon.2], text[GRID.text.3], text[GRID.text.4])

Figure 1 is visualizing the output of a single venn diagram. First, we are creating a new plotting page with the grid.newpage function. We should usually do this step before the creation of each venn diagram, because otherwise the venn diagram is just overlaying previously created plots.

Second, we are producing our single venn diagram with the draw.single.venn function. All we are specifying within the function is the size of our area (i.e. number of patients enrolled in our study).

Let us consider that the researcher first wants to visualie the number of patients who will receive the new drug B and C.

In such situation, we can use the pairwise venn diagram. here, we need to specify the sizes of the areas of both sets as well as the intersection of the two sets

grid.newpage()
draw.pairwise.venn(area1 = length(df$trt[grep("B",df$trt)]),            
                   area2 = length(df$trt[grep("C",df$trt)]),
                   cross.area = length(df$trt[grep("BC",df$trt)]))

## (polygon[GRID.polygon.5], polygon[GRID.polygon.6], polygon[GRID.polygon.7], polygon[GRID.polygon.8], text[GRID.text.9], text[GRID.text.10], text[GRID.text.11], text[GRID.text.12], text[GRID.text.13])

Figure 2 is showing the output of patients recieving the new drug and the combination of new drug. As you can see, the size of the areas are reflected in the visualization of the pairwise venn diagram.

Let us now move on to draw a venn diagram with three drugs A,B and C. We will also learn different formatting techniques alongwith drawing the Venn diagram.

To draw a Venn diagramm with three sets in R, we can use the draw.triple.venn function to create a venn diagram with three sets. Note that we need to specify three different area values as well as the pairwise intersections and the intersection area of all sets.

grid.newpage() 
draw.triple.venn(area1 = length(df$trt[grep("A",df$trt)]),                               area2 = length(df$trt[grep("B",df$trt)]),
                 area3 = length(df$trt[grep("C",df$trt)]),
                 n12 = length(df$trt[grep("AB",df$trt)]),
                 n23 = length(df$trt[grep("BC",df$trt)]),
                 n13 = length(df$trt[grep("AC",df$trt)]),
                 n123= length(df$trt[grep("ABC",df$trt)]),
                 fill = c("pink", "green","orange"),
                 lty = "blank",
                 category = c("Drug A", "Drug B","Drug C"))

## (polygon[GRID.polygon.14], polygon[GRID.polygon.15], polygon[GRID.polygon.16], polygon[GRID.polygon.17], polygon[GRID.polygon.18], polygon[GRID.polygon.19], text[GRID.text.20], text[GRID.text.21], text[GRID.text.22], text[GRID.text.23], text[GRID.text.24], text[GRID.text.25], text[GRID.text.26], text[GRID.text.27], text[GRID.text.28], text[GRID.text.29])

The number 6 in the first set represents those patients who have received only Drug A, the number 4 in the intersection of first and second set presents those patients who recieved Drug AB, the number 3 lying in the intersection of all the three sets presents those patients receiving Drug ABC. Similarly, other numbers can be interpreted in the same fashion.