# ROC and PR Curves in ggplot2

How to make ROC and PR Curves in ggplot2 with Plotly.

New to Plotly?

Plotly is a free and open-source graphing library for R. We recommend you read our Getting Started guide for the latest installation or upgrade instructions, then move on to our Plotly Fundamentals tutorials or dive straight in to some Basic Charts tutorials.

## Roc curve

We create an example data set. There are 2 markers, one that is moderately predictive and one that is not as predictive.

Next use the ggplot function to define the aesthetics, and the geom_roc function to add an ROC curve layer. The geom_roc function requires the aesthetics d for disease status, and m for marker. The disease status need not be coded as 0/1, but if it is not, stat_roc assumes (with a warning) that the lowest value in sort order signifies disease-free status. stat_roc and geom_roc are linked by default, with the stat doing the underlying computation of the empirical ROC curve, and the geom consisting of the ROC curve layer.

library(plotly)
library(ggplot2)
library(plotROC)

D.ex <- rbinom(200, size = 1, prob = .5)
M1 <- rnorm(200, mean = D.ex, sd = .65)
M2 <- rnorm(200, mean = D.ex, sd = 1.5)

test <- data.frame(D = D.ex, D.str = c("Healthy", "Ill")[D.ex + 1],
M1 = M1, M2 = M2, stringsAsFactors = FALSE)

p <- ggplot(test, aes(d = D, m = M1)) +
geom_roc()

ggplotly(p)


The geom_roc layer includes the ROC curve line combined with points and labels to display the values of the biomarker at the different cutpoints. It accepts the argument n.cuts to define the number of cutpoints to display along the curve. Labels can be supressed by using n.cuts = 0 or labels = FALSE. The size of the labels and the number of significant digits can be adjusted with labelsize and labelround, respectively.

## Modify legend

library(plotly)
library(ggplot2)
library(plotROC)

D.ex <- rbinom(200, size = 1, prob = .5)
M1 <- rnorm(200, mean = D.ex, sd = .65)
M2 <- rnorm(200, mean = D.ex, sd = 1.5)

test <- data.frame(D = D.ex, D.str = c("Healthy", "Ill")[D.ex + 1],
M1 = M1, M2 = M2, stringsAsFactors = FALSE)

p <- ggplot(test, aes(d = D, m = M1)) +
geom_roc(n.cuts = 0)

ggplotly(p)


Change label size and number of labels.

library(plotly)
library(ggplot2)
library(plotROC)

D.ex <- rbinom(200, size = 1, prob = .5)
M1 <- rnorm(200, mean = D.ex, sd = .65)
M2 <- rnorm(200, mean = D.ex, sd = 1.5)

test <- data.frame(D = D.ex, D.str = c("Healthy", "Ill")[D.ex + 1],
M1 = M1, M2 = M2, stringsAsFactors = FALSE)

p <- ggplot(test, aes(d = D, m = M1)) +
geom_roc(n.cuts = 5, labelsize = 5, labelround = 2)

ggplotly(p)


Increase number of labels.

library(plotly)
library(ggplot2)
library(plotROC)

D.ex <- rbinom(200, size = 1, prob = .5)
M1 <- rnorm(200, mean = D.ex, sd = .65)
M2 <- rnorm(200, mean = D.ex, sd = 1.5)

test <- data.frame(D = D.ex, D.str = c("Healthy", "Ill")[D.ex + 1],
M1 = M1, M2 = M2, stringsAsFactors = FALSE)

p <- ggplot(test, aes(d = D, m = M1)) +
geom_roc(n.cuts = 50, labels = FALSE)

ggplotly(p)


style_roc that can be added to a ggplot that contains an ROC curve layer. This adds a diagonal guideline, sets the axis labels, and adjusts the major and minor grid lines. The direct_label function operates on a ggplot object, adding a direct label to the plot. It attempts to intellegently select an appropriate location for the label, but the location can be adjusted with nudge_x, nudge_y and label.angle. If the labels argument is NULL, it will take the name from the mapped aesthetic.

library(plotly)
library(ggplot2)
library(plotROC)

D.ex <- rbinom(200, size = 1, prob = .5)
M1 <- rnorm(200, mean = D.ex, sd = .65)
M2 <- rnorm(200, mean = D.ex, sd = 1.5)

test <- data.frame(D = D.ex, D.str = c("Healthy", "Ill")[D.ex + 1],
M1 = M1, M2 = M2, stringsAsFactors = FALSE)

p <- ggplot(test, aes(d = D, m = M1)) + geom_roc() +
style_roc(theme = theme_grey, xlab = "1 - Specificity")

ggplotly(p)


## Confidence regions

It is common to compute confidence regions for points on the ROC curve using the Clopper and Pearson (1934) exact method. Briefly, exact confidence intervals are calculated for the FPF and TPF separately.

This is implemented in the stat_rocci and displayed as a geom_rocci layer. These both require the same aesthetics as the ROC geom, d for disease status and m for marker. By default, a set of 3 evenly spaced points along the curve are chosed to display confidence regions. You can select points by passing a vector of values in the range of m to the ci.at argument. By default, the significance level α is set to 0.05, this can be changed using the sig.level option.

library(plotly)
library(ggplot2)
library(plotROC)

D.ex <- rbinom(200, size = 1, prob = .5)
M1 <- rnorm(200, mean = D.ex, sd = .65)
M2 <- rnorm(200, mean = D.ex, sd = 1.5)

test <- data.frame(D = D.ex, D.str = c("Healthy", "Ill")[D.ex + 1],
M1 = M1, M2 = M2, stringsAsFactors = FALSE)

p <- ggplot(test, aes(d = D, m = M1)) + geom_roc() +
style_roc(theme = theme_grey, xlab = "1 - Specificity") +
geom_rocci()

ggplotly(p)


## Multiple ROC curves

If you have grouping factors in your dataset, or you have multiple markers measured on the same subjects, you may wish to plot multiple ROC curves on the same plot. plotROC fully supports faceting and grouping done by ggplot2. In out example dataset, we have 2 markers measured in a paired manner. These data are in wide format, with the 2 markers going across 2 columns. ggplot requires long format, with the marker result in a single column, and a third variable identifying the marker. We provide the function melt_roc to perform this transformation. The arguments are the data frame, a name or index identifying the disease status column, and a vector of names or indices identifying the the markers. Optionally, the names argument gives a vector of names to assign to the marker, replacing their column names. The result is a data frame in long format.

library(plotly)
library(ggplot2)
library(plotROC)

D.ex <- rbinom(200, size = 1, prob = .5)
M1 <- rnorm(200, mean = D.ex, sd = .65)
M2 <- rnorm(200, mean = D.ex, sd = 1.5)

test <- data.frame(D = D.ex, D.str = c("Healthy", "Ill")[D.ex + 1],
M1 = M1, M2 = M2, stringsAsFactors = FALSE)

longtest <- melt_roc(test, "D", c("M1", "M2"))

p <- ggplot(longtest, aes(d = D, m = M, color = name)) +
geom_roc() +
style_roc()

ggplotly(p)


Similarly to a single ROC curve, you can add confidence intervals.

library(plotly)
library(ggplot2)
library(plotROC)

D.ex <- rbinom(200, size = 1, prob = .5)
M1 <- rnorm(200, mean = D.ex, sd = .65)
M2 <- rnorm(200, mean = D.ex, sd = 1.5)

test <- data.frame(D = D.ex, D.str = c("Healthy", "Ill")[D.ex + 1],
M1 = M1, M2 = M2, stringsAsFactors = FALSE)

longtest <- melt_roc(test, "D", c("M1", "M2"))

p <- ggplot(longtest, aes(d = D, m = M, linetype = name)) +
geom_roc() +
geom_rocci()

ggplotly(p)


You can create a facet plot for every curve.

library(plotly)
library(ggplot2)
library(plotROC)

D.ex <- rbinom(200, size = 1, prob = .5)
M1 <- rnorm(200, mean = D.ex, sd = .65)
M2 <- rnorm(200, mean = D.ex, sd = 1.5)

test <- data.frame(D = D.ex, D.str = c("Healthy", "Ill")[D.ex + 1],
M1 = M1, M2 = M2, stringsAsFactors = FALSE)

longtest <- melt_roc(test, "D", c("M1", "M2"))

p <- ggplot(longtest, aes(d = D, m = M, color = name)) +
geom_roc() +
facet_wrap(~ name) +
style_roc()

ggplotly(p)


### What About Dash?

Dash for R is an open-source framework for building analytical applications, with no Javascript required, and it is tightly integrated with the Plotly graphing library.

Learn about how to install Dash for R at https://dashr.plot.ly/installation.

Everywhere in this page that you see fig, you can display the same figure in a Dash for R application by passing it to the figure argument of the Graph component from the built-in dashCoreComponents package like this:

library(plotly)

fig <- plot_ly()
# fig <- fig %>% add_trace( ... )
# fig <- fig %>% layout( ... )

library(dash)
library(dashCoreComponents)
library(dashHtmlComponents)

app <- Dash$new() app$layout(
htmlDiv(
list(
dccGraph(figure=fig)
)
)
)