What is AUC - ROC Curve?

AUC - ROC curve is a performance measurement for classification problem at various thresholds settings. ROC is a probability curve and AUC represents degree or measure of separability. It tells how much model is capable of distinguishing between classes. Higher the AUC, better the model is at predicting 0s as 0s and 1s as 1s.

The ROC curve is plotted with TPR against the FPR where TPR is on y-axis and FPR is on the x-axis.

Here we describe how to search CRAN for packages to plot ROC curves, and highlight three useful packages.

We use Gábor Csárdi’s relatively new package pkgsearch to search through. The package_search() function takes a text string as input and uses basic text mining techniques to search all of CRAN. The algorithm searches through package text fields, and produces a score for each package it finds that is weighted by the number of reverse dependencies and downloads.

Let’s find the packages to plot ROC curve.

install.packages("pkgsearch")
library(pkgsearch)
rocPkg <- pkg_search(query="ROC",size=200)

query= Searches query string.

size= The number of results to list.

Finding the best packages by filtering out orphaned packages and packages with a score less than 190.

library(dplyr)
rocPkgShort <- rocPkg %>% 
  filter(maintainer_name != "ORPHANED", score > 190) %>%
  select(score, package, downloads_last_month) %>%
  arrange(desc(downloads_last_month))
head(rocPkgShort,3)
## # A tibble: 3 x 3
##   score package downloads_last_month
##   <dbl> <chr>                  <int>
## 1 4207. caTools               112815
## 2 8785. pROC                   84642
## 3  899. ROCR                   54703

1. ROCR Package

Here we’ll use the inbuilt data that comes with this package.

#Install & load ROCR package
install.packages("ROCR")
library(ROCR)
data(ROCR.simple)
df <- data.frame(ROCR.simple)
pred <- prediction(df$predictions, df$labels)
perf <- performance(pred,"tpr","fpr")
plot(perf)

2. pROC Package:

Here along with the ROC curve, it is pretty easy to get confidence intervals for the Area Under the Curve, AUC, on the plot.

#Install & load pROC package
install.packages("pROC")
library(pROC)
pROC_obj <- roc(df$labels,df$predictions,
                smoothed = TRUE,
                # arguments for ci
                ci=TRUE, ci.alpha=0.9, stratified=FALSE,
                # arguments for plot
                plot=TRUE, auc.polygon=TRUE, max.auc.polygon=TRUE, grid=TRUE,
                print.auc=TRUE, show.thres=TRUE)

#Finding Confidence Interval
sens.ci <- ci.se(pROC_obj)

#Plot the CI along with the ROC curve
plot(sens.ci, type="shape", col="lightblue")
## Warning in plot.ci.se(sens.ci, type = "shape", col = "lightblue"): Low
## definition shape.

3. caTools Package :

install.packages("caTools")
library(caTools)
colAUC(df$predictions, df$labels, plotROC = TRUE)

##              [,1]
## 0 vs. 1 0.8341875

The function colAUC() returns a score called AUC and the plotROC = TRUE argument returns the plot of the ROC curve.