# Calculate the area under the ROC curve of a model

Like the model evaluation measures calculated by the threshMeasures function, the area under the ROC curve (AUC) assesses the discrimination performance of a model; but unlike them, it does not require the choice of a particular threshold above which to consider that a model predicts species presence, but rather averages discrimination performance over all possible thresholds at a given interval. Mind that there has been criticism to the AUC (Lobo et al. 2008, Jiménez-Valverde et al. 2013), although it is still one of the most widely used metrics in model evaluation. It is highly correlated with species prevalence, so this value is also provided by the AUC function (if simplif = FALSE, the default) for reference.

Although there are functions to calculate the AUC in other R packages (such as ROCR, PresenceAbsence and verification), the AUC function below is more compatible with the remaining functions in this blog and in the modEvA package where it is included (Barbosa et al. 2014), and can be applied not only to a set of presence-absence versus predicted values, but also directly to a model object of class glm.

```AUC <- function(obs = NULL, pred = NULL, model = NULL, interval = 0.01,
simplif = FALSE, plot = TRUE, plot.values = TRUE, plot.preds = FALSE,
grid = FALSE, xlab = c("False positive rate", "(1-specificity)"),
ylab = c("True positive rate", "(sensitivity)"), main = "ROC curve", ...) {
# version 1.4 (16 Oct 2014)

if (!is.null(model)) {
if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.")
if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.")
obs <- model\$y
pred <- model\$fitted.values
}  # end if model

stopifnot(
length(obs) == length(pred),
obs %in% c(0,1),
pred >= 0,
pred <= 1,
interval > 0,
interval < 1
)

n1 <- sum(obs == 1)
n0 <- sum(obs == 0)

# next 3 lines from Wintle et al 2005 supp mat "roc" function
xy <- c(pred[obs == 0], pred[obs == 1])
rnk <- rank(xy)
AUC <- ((n0 * n1) + ((n0 * (n0 + 1))/2) - sum(rnk[1 : n0])) / (n0 * n1)

if (simplif)  return(AUC)

N <- length(obs)
preval <- prevalence(obs)
thresholds <- seq(0, 1, by = interval)
Nthresh <- length(thresholds)
true.positives <- true.negatives <- sensitivity <- specificity <- false.pos.rate <- n.preds <- prop.preds <- numeric(Nthresh)

for (t in 1 : Nthresh) {
true.positives[t] <- sum(obs == 1 & pred >= thresholds[t])
true.negatives[t] <- sum(obs == 0 & pred < thresholds[t])
sensitivity[t] <- true.positives[t] / n1
specificity[t] <- true.negatives[t] / n0
false.pos.rate[t] <- 1 - specificity[t]
n.preds[t] <- sum(round(pred, nchar(Nthresh) - 1) == thresholds[t])  # inspired by Peterson Papes & Soberon 2008 rethinking ROC analysis
prop.preds[t] <- n.preds[t] / length(pred)
}; rm(t)

if (plot) {
plot(x = c(0, 1), y = c(0, 1), type = "l", xlab = xlab, ylab = ylab, main = main, col = "grey", ...)  # plots the 0.5 diagonal

if (grid) abline(h = thresholds, v = thresholds, col = "lightgrey")

lines(x = false.pos.rate, y = sensitivity, lwd = 2)  # ROC curve

if (plot.values) {
text(1.0, 0.1, adj = 1, substitute(paste(AUC == a), list(a = round(AUC, 3))))
}  # end if plot.values

if (plot.preds) {
#points(x = false.pos.rate, y = sensitivity, cex = rev(sqrt(100 * prop.preds)), col = "red")  # but why do I have to rev?
#points(x = thresholds, y = specificity, cex = sqrt(100 * prop.preds), col = "blue")
#points(x = false.pos.rate, y = sensitivity, cex = sqrt(100 * prop.preds), col = "red")  # ROC curve
points(x = thresholds, y = rep(0, Nthresh), cex = sqrt(100 * prop.preds), col = "red")
}  # end if plot.preds
}  # end if plot

return (list(thresholds = data.frame(thresholds, true.positives, true.negatives, sensitivity, specificity, false.pos.rate, n.preds, prop.preds), N = N, prevalence = preval, AUC = AUC, AUCratio = AUC / 0.5))
}```

[presented with Pretty R]

Usage examples:

AUC(obs = mydata\$myspecies, pred = mydata\$myspecies_P, simplif = TRUE)

AUC(obs = mydata\$myspecies, pred = mydata\$myspecies_P)

AUC(model = mymodel, main = “my species’ ROC plot”)

AUC(model = mymodel, plot.values = FALSE)

If you want the AUC values for a list of models produced e.g. by the multGLM function, you can do:

mymodels.auc <- vector(“numeric”, length(mymodels\$models))

names(mymodels.auc) <- names(mymodels\$models)

for (i in 1:length(mymodels\$models)) {

mymodels.auc[i] <- AUC(model = mymodels\$models[[i]], simplif = TRUE)

}

mymodels.auc

References:

Barbosa A.M., Brown J.A. & Real R. (2014) modEvA – an R package for model evaluation and analysis. R package, version 0.1.

Lobo, J.M., Jiménez-Valverde, A. & Real, R. (2008). AUC: a misleading measure of the performance of predictive distribution models. Global Ecology and Biogeography 17: 145-151

Jiménez-Valverde, A., Acevedo, P., Barbosa, A.M., Lobo, J.M. & Real, R. (2013). Discrimination capacity in species distribution models depends on the representativeness of the environmental domain. Global Ecology and Biogeography 22: 508–516