Model evaluation with presence points and raster predictions

The Boyce index (Boyce et al. 2002) is often described as a presence-only metric for evaluating the predictions of species distribution (or ecological niche, or habitat suitability) models (e.g. Hirzel et al. 2006, Cianfrani et al. 2010, Bellard et al. 2013, Valavi et al. 2022). It measures the predicted-to-expected ratio of presences in each class of predicted values, i.e., whether classes / regions with higher predicted values have indeed higher proportions of presences than expected by chance. But, if there’s a proportion of presences in each class, then the rest of the class (i.e. the complementary proportion) must be… non-presences, which are equally necessary for the computation of this index (as for virtually any other model evaluation metric). We need both pixels with and pixels without presence records, i.e. both presences and (pseudo)absences, or the Boyce index cannot be computed: try using a raster map with values only in pixels with presence (e.g. my_raster <- mask(my_raster, my_presences)) and see what happens. The same applies to presence-background modelling methods like Maxent and ENFA, which use both the presence and the remaining (i.e. “non-presence”) pixels, and can’t produce proper predictions without non-presence pixels — you can also try this for yourself. So, they need the same data that presence-absence methods need. Whether or not the computation requires an explicit identification of the absences, or just presence points and a map of the entire study area, simply depends on how each method or metric is implemented in each particular software package.

The modEvA‘ R package computes the Boyce index and a range of other metrics that evaluate model predictions against presence/absence or presence/background data, including the area under the ROC curve (AUC), the area under the precision-recall curve, sensitivity, specificity, TSS, Cohen’s kappa, Hosmer-Lemeshow goodness-of-fit, Miller calibration statistics, and several others. For all these metrics, the input can be either 1) a model object of a range of implemented classes, or 2) a pair of numeric vectors with observed presence/absence and the corresponding predicted values, or 3) a set of presence point coordinates and a raster map of the predictions across the model evaluation region. Here some usage examples:

install.packages("modEvA", repos = "http://R-Forge.R-project.org")
library(modEvA)
library(terra)

# import a raster map with a predictor variable:
elev <- rast(system.file("ex/elev.tif", package = "terra"))

# import a species' occurrence data for this area:
gbif <- geodata::sp_occurrence(genus = "Dendrocopos", species = "major", ext = elev)
head(gbif)
unique(gbif$occurrenceStatus)  # remove absences if any!
pres_coords <- gbif[ , c("lon", "lat")]
plot(elev)
points(pres_coords, pch = 20, cex = 0.2)

# get a data frame of the pixels with and without presences:
dat <- fuzzySim::gridRecords(elev, pres_coords)
head(dat)
points(dat[dat$presence == 0, c("x", "y")], col = "red", cex = 0.5)
points(dat[dat$presence == 1, c("x", "y")], col = "blue", cex = 0.5)

# compute a species distribution model (e.g. a GLM) from these data:
mod <- glm(presence ~ elevation, family = binomial, data = dat)
summary(mod)

# get a raster map with the predictions from this model:
pred <- predict(elev, mod, type = "response")
plot(pred, main = "Woodpecker presences and predictions")
points(pres_coords, pch = 20, cex = 0.2)
# compute some model evaluation metrics
# using just these presence coordinates and raster predictions:

par(mfrow = c(2, 2), mar = c(5, 5, 2, 1))

Boyce(obs = pres_coords, pred = pred, main = "Boyce index")

AUC(obs = pres_coords, pred = pred, main = "ROC curve")

threshMeasures(obs = pres_coords, pred = pred, thresh = "maxTSS", measures = c("Sensitivity", "Specificity", "Precision", "TSS", "kappa"), standardize = FALSE, main = "Threshold-based")

AUC(obs = pres_coords, pred = pred, curve = "PR", main = "PR curve")

So, any evaluation metric can be computed using “only presence points”, as long as the raster map of predictions includes both pixels that do and pixels that don’t overlap these points, the latter of which are necessarily taken as available unoccupied pixels by all of these metrics (Boyce index included).

Note you may get slightly different results with modEvA::Boyce() and with ecospat::ecospat.boyce(), because ecospat.boyce removes duplicate classes by default, though both functions allow controlling this with argument ‘rm.dup.classes‘ or ‘rm.duplicate‘, respectively (see help file of either function for details). Bug reports and other feedback welcome on this version of package ‘modEvA‘!

REFERENCES

Boyce, M.S., P.R. Vernier, S.E. Nielsen & F.K.A. Schmiegelow (2002) Evaluating resource selection functions. Ecological Modelling 157: 281-300

Bellard, C., Thuiller, W., Leroy, B., Genovesi, P., Bakkenes, M., & Courchamp, F. (2013) Will climate change promote future invasions? Global Change Biology, 19(12), 3740. https://doi.org/10.1111/GCB.12344

Cianfrani, C., Le Lay, G., Hirzel, A. H., & Loy, A. (2010) Do habitat suitability models reliably predict the recovery areas of threatened species? Journal of Applied Ecology, 47(2), 421–430. https://doi.org/10.1111/J.1365-2664.2010.01781.X

Hirzel, A. H., Le Lay, G., Helfer, V., Randin, C., & Guisan, A. (2006) Evaluating the ability of habitat suitability models to predict species presences. Ecological Modelling, 199(2), 142-152. http://www.sciencedirect.com/science/article/pii/S0304380006002468

Valavi, R., Guillera-Arroita, G., Lahoz-Monfort, J. J., & Elith, J. (2022) Predictive performance of presence-only species distribution models: a benchmark study with reproducible code. Ecological Monographs, 92(1), e01486. https://doi.org/10.1002/ECM.1486

Miller calibration statistics

The MillerCalib function, now included in the modEvA package (version 0.7.2 just released), calculates Miller’s (1991) calibration statistics for a logistic regression model, namely the intercept and slope of the regression of the binary response variable on the logit of predicted probabilities. Optionally and by default, it also plots the corresponding regression line (black) over the reference diagonal (grey dashed line).

Calibration (or reliability) measures how a model’s predicted probabilities relate to observed species prevalence or proportion of presences in the modelled data. If predictions are perfectly calibrated, the slope will equal 1 and the intercept will equal 0, so the model’s calibation line will perfectly overlap with the reference diagonal. Note that Miller’s statistics assess the model globally: a model is well calibrated if the average of all predicted probabilities equals the proportion of presences in the modelled data. Therefore, good calibration is always attained on the same data used for building the model (Miller 1991). The function is more useful to see if a model stays well calibrated when extrapolated to a new dataset.

MillerCalib <- function(model = NULL, obs = NULL, pred = NULL, plot = TRUE, plot.values = TRUE, digits = 4, xlab = "", ylab = "", main = "Miller calibration", ...) {
  # version 1.3 (24 Jun 2015)
 
  model.provided <- ifelse(is.null(model), FALSE, TRUE)
 
  if (model.provided) {
    if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.")
    if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.")
    obs <- model$y
    pred <- model$fitted.values
  } else { # if model not provided
    if (is.null(obs) | is.null(pred)) stop("You must provide either 'obs' and 'pred', or a 'model' object of class 'glm'")
  }
 
  stopifnot(
    length(obs) == length(pred),
    obs %in% c(0,1),
    pred >= 0,
    pred <= 1
  )
 
  logit <- log(pred / (1 - pred))
  mod <- glm(obs ~ logit, family = binomial)
  intercept <- mod$coef[[1]]
  slope <- mod$coef[[2]]
  std.err <- summary(mod)$coefficients["logit", "Std. Error"]
  slope.p <- abs((slope - 1) / sqrt(std.err^2 + 0))  # Paternoster 98; http://stats.stackexchange.com/questions/55501/test-a-significant-difference-between-two-slope-values
 
  if (plot) {
    ymin <- min(0, intercept)
    ymax <- max(1, intercept + 0.1)
    plot(c(0, 1), c(ymin, ymax), type = "n", xlab = xlab, ylab = ylab, main = main)
    abline(0, 1, col = "lightgrey", lty = 2)
    abline(intercept, slope)
    if (plot.values) {
      plotext <- paste("intercept =" , round(intercept, digits), "\nslope =", round(slope, digits), "\nslope p-value =", round(slope.p, digits))
      text(x = 0, y = ymax - 0.25, adj = 0, labels = plotext)
    }
  }
  return(list(intercept = intercept, slope = slope, slope.pvalue = slope.p))
}  # end MillerCalib function

[presented with Pretty R]

Input data can be either a glm model object with binomial family and logit link, or two vectors of observed binary (0 or 1) values and the corresponding predicted probabilities. The output is a named list with the calibration intercept and slope.

REFERENCES

Miller M.E., Hui S.L. & Tierney W.M. (1991) Validation techniques for logistic regression models. Statistics in Medicine, 10: 1213-1226

Wintle B.A., Elith J. & Potts J.M. (2005) Fauna habitat modelling and mapping: A review and case study in the Lower Hunter Central Coast region of NSW. Austral Ecology, 30: 719-738

R-squared measures for generalized linear models

There are several ways of calculating (pseudo) R-squared values for logistic regression models, with no consensus about which is best. The RsqGLM function, now included in the modEvA package, calculates those of McFadden (1974), Cox & Snell (1989), Nagelkerke (1991), Tjur (2009), and the squared Pearson correlation between observed and predicted values:

RsqGLM <- function(obs = NULL, pred = NULL, model = NULL) {
  # version 1.2 (3 Jan 2015)
 
  model.provided <- ifelse(is.null(model), FALSE, TRUE)
 
  if (model.provided) {
    if (!("glm" %in% class(model))) stop ("'model' must be of class 'glm'.")
    if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.")
    if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.")
    obs <- model$y
    pred <- model$fitted.values
 
  } else { # if model not provided
    if (is.null(obs) | is.null(pred)) stop ("You must provide either 'obs' and 'pred', or a 'model' object of class 'glm'")
    if (length(obs) != length(pred)) stop ("'obs' and 'pred' must be of the same length (and in the same order).")
    if (!(obs %in% c(0, 1)) | pred < 0 | pred > 1) stop ("Sorry, 'obs' and 'pred' options currently only implemented for binomial GLMs (binary response variable with values 0 or 1) with logit link.")
    logit <- log(pred / (1 - pred))
    model <- glm(obs ~ logit, family = "binomial")
  }
 
  null.mod <- glm(obs ~ 1, family = family(model))
  loglike.M <- as.numeric(logLik(model))
  loglike.0 <- as.numeric(logLik(null.mod))
  N <- length(obs)
 
  # based on Nagelkerke 1991:
  CoxSnell <- 1 - exp(-(2 / N) * (loglike.M - loglike.0))
  Nagelkerke <- CoxSnell / (1 - exp((2 * N ^ (-1)) * loglike.0))
 
  # based on Allison 2014:
  McFadden <- 1 - (loglike.M / loglike.0)
  Tjur <- mean(pred[obs == 1]) - mean(pred[obs == 0])
  sqPearson <- cor(obs, pred) ^ 2
 
  return(list(CoxSnell = CoxSnell, Nagelkerke = Nagelkerke, McFadden = McFadden, Tjur = Tjur, sqPearson = sqPearson))
}

[presented with Pretty R]

Input data can be either a glm model object or two vectors of observed binary (0 or 1) values and the corresponding predicted probabilities (only binomial-logit GLMs admitted in this case). The output is a named list of the calculated R-squared values. See also the Dsquared function.

 

REFERENCES

Cox, D.R. & Snell E.J. (1989) The Analysis of Binary Data, 2nd ed. Chapman and Hall, London

McFadden, D. (1974) Conditional logit analysis of qualitative choice behavior. In: Zarembka P. (ed.) Frontiers in Economics. Academic Press, New York

Nagelkerke, N.J.D. (1991) A note on a general definition of the coefficient of determination. Biometrika, 78: 691-692

Tjur T. (2009) Coefficients of determination in logistic regression models – a new proposal: the coefficient of discrimination. The American Statistician, 63: 366-372.

Calculate evaluation measures for multiple models

If you have a list of models created e.g. with the multGLM function, or a data frame with presence-absence data and the corresponding predicted values for a set of species, you can use the multModEv function to get a set of evaluation measures for all models simultaneously. Note that all models must have the same sample size. This function is included in the modEvA package (Barbosa et al. 2014).

multModEv.methods <- 'Prevalence', 'AUC', 'CCR', 'Misclass', 'Sensitivity', 'Specificity', 'Omission', 'Commission', 'PPP', 'NPP', 'UPR', 'OPR', 'PPI', 'PAI', 'kappa', 'TSS', 'NMI', 'OddsRatio', 'HL', 'HL.p', 'RMSE', 'Miller.int', 'Miller.slope', 'Miller.p'
 
multModEv <-
function(models = NULL, obs.data = NULL, pred.data = NULL, measures = multModEv.methods, standardize = FALSE, thresh = 0.5, bin.method = "quantiles", quiet = TRUE) {
  # version 2.0 (24 Jun 2015)
 
  for (m in measures) {
    if (!(m %in% modEvAmethods("multModEv"))) stop(m, " is not a valid measure; type modEvAmethods('multModEv') for available options.")
  }
 
  if (!bin.method %in% modEvAmethods("getBins")) stop("Invalid bin.method; type modEvAmethods('getBins') for available options.")
 
  if (is.null(models)) {
    if (is.null(obs.data) | is.null(pred.data)) stop("You must provide either a list of model object(s) of class 'glm', or a set of obs.data + pred.data with matching dimensions.")
  }  # end if !models
 
  else {
    if (!is.null(obs.data)) message("Argument 'obs.data' ignored in favour of 'models'.")
    if (!is.null(pred.data)) message("Argument 'pred.data' ignored in favour of 'models'.")
 
    n <- sapply(models, function(x) length(resid(x)))
    if(!(all(n == n[1]))) stop("Models must all be based on the same sample size.")
 
    obs.data <- pred.data <- fav.data <- matrix(data = NA, nrow = length(resid(models[[1]])), ncol = length(models), dimnames = list(NULL, names(models)))
    for (m in 1 : length(models)) {
      obs.data[ , m] <- models[[m]]$y
      pred.data[ , m] <- models[[m]]$fitted.values
    }  # end for m
  }  # end if models
 
  n.models <- ncol(obs.data)
  n.measures <- length(measures)
  results <- matrix(NA, nrow = n.models, ncol = n.measures)
  rownames(results) <- colnames(obs.data)
  colnames(results) <- measures
 
  message(n.models, " models to evaluate; ", n.measures,  " measures to calculate for each.")
 
  if (any(measures %in% modEvAmethods("threshMeasures"))) {
    thresh.measures <- measures[measures %in% modEvAmethods("threshMeasures")]
    cat("- using '", thresh, "' as the threshold value to calculate ", paste(thresh.measures, collapse = ", "), "\n", sep = "")  # doesn't quite work with 'message' or 'paste' or with a 'sep'
  }  # this needs to be oustide the 'for' loop
 
  bin.measures <- c("HL", "HL.p", "RMSE")  # ABC, unityRsq
  if (any(measures %in% bin.measures)) {
    bin.measures <- measures[measures %in% bin.measures]
    cat("- using '", bin.method, "' as the bin method to calculate ", paste(bin.measures, collapse = ", "), "\n", sep = "")  # doesn't quite work with 'message' or 'paste' or with a 'sep'
  }  # this needs to be oustide the 'for' loop
 
for (m in 1:n.models) {
 
    message("Evaluating model ", m, "...")
 
    if ("Prevalence" %in% measures)
      results[m, "Prevalence"] <- prevalence(obs.data[ , m])
 
    if ("Evenness" %in% measures)
      results[m, "Evenness"] <- evenness(obs.data[ , m])
 
    if ("AUC" %in% measures)
      results[m, "AUC"] <- AUC(obs = obs.data[ , m], pred = pred.data[ , m], simplif = TRUE)
 
    if (any(measures %in% modEvAmethods("threshMeasures"))) {
      for (m in 1:n.models)  for (tm in thresh.measures) {
        results[m, tm] <- threshMeasures(obs = obs.data[ , m], pred = pred.data[ , m], measures = tm, thresh = thresh, standardize = standardize, simplif = TRUE)
      }; rm(m, tm)
    }  # end if measures in modEvAmethods("threshMeasures")
    # thresh.measures <- optiThresh(obs = obs.data[ , m], pred = pred.data[ , m], plot = FALSE, optimize = "each")
 
    if (any(measures %in% c("HL", "HL.p"))) {
      for (m in 1:n.models) {
        HL <- HLfit(obs = obs.data[ , m], pred = pred.data[ , m], bin.method = bin.method, simplif = TRUE)
        if ("HL" %in% measures)  results[m, "HL"] <- HL$chi.sq
        if ("HL.p" %in% measures)  results[m, "HL.p"] <- HL$p.value
        if ("RMSE" %in% measures)  results[m, "RMSE"] <- HL$RMSE
      }; rm(m)
    }  # end if HL
 
    if (any(measures %in% c("Miller.int", "Miller.slope", "Miller.p"))) {
      for (m in 1:n.models) {
        Miller <- MillerCalib(obs = obs.data[ , m], pred = pred.data[ , m], plot = FALSE)
        if ("Miller.int" %in% measures)
          results[m, "Miller.int"] <- Miller$intercept
        if ("Miller.slope" %in% measures)
          results[m, "Miller.slope"] <- Miller$slope
        if ("Miller.p" %in% measures)
          results[m, "Miller.p"] <- Miller$slope.pvalue
      }; rm(m)
    } # end if Miller
  }  # end for m
 
  if (standardize) {
    colnames(results)[colnames(results) == "kappa"] <- "skappa"
    colnames(results)[colnames(results) == "TSS"] <- "sTSS"
  }  # end if standardize
 
  results <- data.frame(Model = rownames(results), results, row.names = NULL)
  message("Finished!")
  return(results)
}

[presented with Pretty R] Usage examples:

multModEv(obs.data = mydata[ , 2:8], pred.data = mydata[ , 9:15], thresh = “preval”) multModEv(models = mymodels$models) multModEv(models = mymodels$models, thresh = 0.5, measures = c(“AUC”, “CCR”, “Sensitivity”, “TSS”))

REFERENCES: Barbosa A.M., Brown J.A. & Real R. (2014) modEvA – an R package for model evaluation and analysis. R package, version 0.1.

Calculate the area under the ROC curve of a model

Like the model evaluation measures calculated by the threshMeasures function, the area under the ROC curve (AUC) assesses the discrimination performance of a model; but unlike them, it does not require the choice of a particular threshold above which to consider that a model predicts species presence, but rather averages discrimination performance over all possible thresholds at a given interval. Mind that there has been criticism to the AUC (Lobo et al. 2008, Jiménez-Valverde et al. 2013), although it is still one of the most widely used metrics in model evaluation. It is highly correlated with species prevalence, so this value is also provided by the AUC function (if simplif = FALSE, the default) for reference.

Although there are functions to calculate the AUC in other R packages (such as ROCR, PresenceAbsence and verification), the AUC function below is more compatible with the remaining functions in this blog and in the modEvA package where it is included (Barbosa et al. 2014), and can be applied not only to a set of presence-absence versus predicted values, but also directly to a model object of class glm.

AUC <- function(model = NULL, obs = NULL, pred = NULL, simplif = FALSE, interval = 0.01, FPR.limits = c(0, 1), curve = "ROC",  method = "rank", plot = TRUE, diag = TRUE, diag.col = "grey", diag.lty = 1, curve.col = "black", curve.lty = 1, curve.lwd = 2, plot.values = TRUE, plot.digits = 3, plot.preds = FALSE, grid = FALSE, xlab = "auto", ylab = "auto", ...) {
  # version 2.2 (17 Jan 2020)
  
  if (all.equal(FPR.limits, c(0, 1)) != TRUE) stop ("Sorry, 'FPR.limits' not yet implemented. Please use default values.")
  
  if (length(obs) != length(pred))  stop ("'obs' and 'pred' must be of the same length (and in the same order).")
  
  if (!is.null(model)) {
    if(!("glm" %in% class(model) && model$family$family == "binomial" && model$family$link == "logit")) stop ("'model' must be an object of class 'glm' with 'binomial' family and 'logit' link.")
    if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.")
    if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.")
    obs <- model$y
    pred <- model$fitted.values
  }  # end if model
  
  dat <- data.frame(obs, pred)
  n.in <- nrow(dat)
  dat <- na.omit(dat)
  n.out <- nrow(dat)
  if (n.out < n.in)  warning (n.in - n.out, " observations removed due to missing data; ", n.out, " observations actually evaluated.")
  obs <- dat$obs
  pred <- dat$pred
  
  stopifnot(
    obs %in% c(0,1),
    #pred >= 0,
    #pred <= 1,
    interval > 0,
    interval < 1,
    curve %in% c("ROC", "PR"),
    method %in% c("rank", "trapezoid", "integrate")
  )
  
  n1 <- sum(obs == 1)
  n0 <- sum(obs == 0)
  
  if (curve != "ROC" && method == "rank") {
    method <- "trapezoid"
    #message("'rank' method not applicable to the specified 'curve'; using 'trapezoid' instead.")
  }
  
  if (method == "rank") {
    # next 3 lines from Wintle et al 2005 supp mat "roc" function
    xy <- c(pred[obs == 0], pred[obs == 1])
    rnk <- rank(xy)
    AUC <- ((n0 * n1) + ((n0 * (n0 + 1))/2) - sum(rnk[1 : n0])) / (n0 * n1)
    if (simplif && !plot) return(AUC)
  }

  if (any(pred < 0) | any(pred > 1)) warning("Some of your predicted values are outside the [0, 1] interval within which thresholds are calculated.")
  
  N <- length(obs)
  preval <- prevalence(obs)
  thresholds <- seq(0, 1, by = interval)
  Nthresh <- length(thresholds)
  true.positives <- true.negatives <- sensitivity <- specificity <- precision <- false.pos.rate <- n.preds <- prop.preds <- numeric(Nthresh)
  
  for (t in 1 : Nthresh) {
    true.positives[t] <- sum(obs == 1 & pred >= thresholds[t])
    true.negatives[t] <- sum(obs == 0 & pred < thresholds[t])
    sensitivity[t] <- true.positives[t] / n1
    specificity[t] <- true.negatives[t] / n0
    precision[t] <- true.positives[t] / sum(pred >= thresholds[t], na.rm = TRUE)
    #if (true.positives[t] == 0 && sum(pred >= thresholds[t], na.rm = TRUE) == 0)  precision[t] <- 0  # to avoid NaN?
    false.pos.rate[t] <- 1 - specificity[t]
    n.preds[t] <- sum(round(pred, nchar(Nthresh) - 1) == thresholds[t])
    prop.preds[t] <- n.preds[t] / length(pred)
  }
  
  if (curve == "ROC") {
    xx <- false.pos.rate
    yy <- sensitivity
  } else {
    if (curve == "PR") {
      xx <- sensitivity
      yy <- precision
    } else {
      stop ("'curve' must be either 'ROC' or 'PR'.")
    }
  }
  
  if (method == "trapezoid") {
    xy <- na.omit(data.frame(xx, yy))
    #if (length(xx) != nrow(xy)) warning("Some non-finite values omitted from area calculation.")
    xx <- xy$xx
    yy <- xy$yy
    # next line adapted from https://stackoverflow.com/a/22418496:
    AUC <- sum(diff(xx) * (yy[-1] + yy[-length(yy)]) / 2)
    AUC <- -AUC  # euze
  }
  
    if (method == "integrate") {
    xx.interp <- stats::approx(x = thresholds, y = xx, n = length(thresholds))
    yy.interp <- stats::approx(x = thresholds, y = yy, n = length(thresholds))
    f <- approxfun(x = xx.interp$y, y = yy.interp$y)
    AUC <- integrate(f, lower = min(thresholds), upper = max(thresholds))$value
    }
  
  if (plot) {
    if (curve == "ROC") {
      if (xlab == "auto") xlab <- c("False positive rate", "(1-specificity)")
      if (ylab == "auto") ylab <- c("True positive rate", "(sensitivity)")
    }
    if (curve == "PR") {
      if (xlab == "auto") xlab <- c("Recall", "(sensitivity)")
      if (ylab == "auto") ylab <- c("Precision", "(positive predictive value)")
    }
    
    d <- ifelse(diag, "l", "n")  # to plot the 0.5 diagonal (or not if diag=FALSE)
    if (curve == "ROC") plot(x = c(0, 1), y = c(0, 1), type = d, xlab = xlab, ylab = ylab, col = diag.col, lty = diag.lty, ...)
    if (curve == "PR") plot(x = c(0, 1), y = c(1, 0), type = d, xlab = xlab, ylab = ylab, col = diag.col, lty = diag.lty, ...)
    
    if (grid) abline(h = thresholds, v = thresholds, col = "lightgrey")
    
    lines(x = xx, y = yy, col = curve.col, lty = curve.lty, lwd = curve.lwd)
    
    if (plot.preds == TRUE) plot.preds <- c("curve", "bottom")  # for back-compatibility
    if ("bottom" %in% plot.preds) {
      points(x = thresholds, y = rep(0, Nthresh), cex = 100 * prop.preds, col = "darkgrey")  # 20 * sqrt(prop.preds)
    }
    if ("curve" %in% plot.preds) {
      points(x = xx, y = yy, cex = 100 * prop.preds, col = "darkgrey")
    }
    
    if (plot.values) {
      if (curve == "ROC") place <- 0.4
      if (curve == "PR") place <- 1
      text(1, place, adj = 1, substitute(paste(AUC == a), list(a = round(AUC, plot.digits))))
    }  # end if plot.values
    
  }  # end if plot
  
  if (simplif)  return (AUC)
  
  thresholds.df <- data.frame(thresholds, true.positives, true.negatives, sensitivity, specificity, precision, false.pos.rate, n.preds, prop.preds)
  rownames(thresholds.df) <- thresholds
  
  return (list(thresholds = thresholds.df, N = N, prevalence = preval, AUC = AUC, AUCratio = AUC / 0.5))
}

EDIT: since modEvA version 1.7 (and as per updated code above), the AUC function can also compute the precision-recall curve.

Usage examples:

AUC(obs = mydata$myspecies, pred = mydata$myspecies_P, simplif = TRUE)
AUC(obs = mydata$myspecies, pred = mydata$myspecies_P)
AUC(model = mymodel, curve = "PR")

If you want the AUC values for a list of models produced e.g. by the multGLM function, you can do:

mymodels.auc <- vector("numeric", length(mymodels$models))
names(mymodels.auc) <- names(mymodels$models)
for (i in 1:length(mymodels$models)) {
mymodels.auc[i] <- AUC(model = mymodels$models[[i]], simplif = TRUE)
}

mymodels.auc

References:

Barbosa A.M., Brown J.A. & Real R. (2014) modEvA – an R package for model evaluation and analysis. R package, version 0.1.

Lobo, J.M., Jiménez-Valverde, A. & Real, R. (2008). AUC: a misleading measure of the performance of predictive distribution models. Global Ecology and Biogeography 17: 145-151

Jiménez-Valverde, A., Acevedo, P., Barbosa, A.M., Lobo, J.M. & Real, R. (2013). Discrimination capacity in species distribution models depends on the representativeness of the environmental domain. Global Ecology and Biogeography 22: 508–516

Calculate the amount of deviance explained by a GLM

Linear models come with an R-squared value that measures the proportion of variation that the model accounts for. The R-squared is provided with summary(model) in R. For generalized linear models (GLMs), the equivalent is the amount of deviance accounted for (D-squared; Guisan & Zimmermann 2000), but this value is not normally provided with the model summary. The Dsquared function, now included in the modEvA package (Barbosa et al. 2014), calculates it. There is also an option to calculate the adjusted D-squared, which takes into account the number of observations and the number of model parameters, thus allowing direct comparison among different models (Weisberg 1980, Guisan & Zimmermann 2000). UPDATE: see also further measures in the new RsqGLM function.

Dsquared <- function(model = NULL, 
                     obs = NULL, 
                     pred = NULL, 
                     family = NULL, # needed only when 'model' not provided
                     adjust = FALSE, 
                     npar = NULL) { # needed only when 'model' not provided
  # version 1.4 (31 Aug 2015)
 
  model.provided <- ifelse(is.null(model), FALSE, TRUE)
 
  if (model.provided) {
    if (!("glm" %in% class(model))) stop ("'model' must be of class 'glm'.")
    if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.")
    if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.")
    obs <- model$y
    pred <- model$fitted.values
 
  } else { # if model not provided
    if (is.null(obs) | is.null(pred)) stop ("You must provide either 'obs' and 'pred', or a 'model' object of class 'glm'.")
    if (length(obs) != length(pred)) stop ("'obs' and 'pred' must be of the same length (and in the same order).")
    if (is.null(family)) stop ("With 'obs' and 'pred' arguments (rather than a model object), you must also specify one of two model family options: 'binomial' or 'poisson' (in quotes).")
    else if (!is.character(family)) stop ("Argument 'family' must be provided as character (i.e. in quotes: 'binomial' or 'poisson').")
    else if (length(family) != 1 | !(family %in% c("binomial", "poisson"))) stop ("'family' must be either 'binomial' or 'poisson' (in quotes).")
 
    if (family == "binomial") {
      if (any(!(obs %in% c(0, 1)) | pred < 0 | pred > 1)) stop ("'binomial' family implies that 'obs' data should be binary (with values 0 or 1) and 'pred' data should be bounded between 0 and 1.")
      link <- log(pred / (1 - pred))  # logit
    }  # end if binomial
 
    else if (family == "poisson") {
      if (any(obs %%1 != 0)) stop ("'poisson' family implies that 'obs' data should consist of whole numbers.")
      link <- log(pred)
    }  # end if poisson
 
    model <- glm(obs ~ link, family = family)
  }  # end if model not provided
 
  D2 <- (model$null.deviance - model$deviance) / model$null.deviance
 
  if (adjust) {
    if (model.provided) {
      n <- length(model$y)
      #p <- length(model$coefficients)
      p <- attributes(logLik(model))$df
    } else {
      if (is.null(npar)) stop ("Adjusted D-squared from 'obs' and 'pred' values (rather than a model object) requires specifying the number of parameters in the underlying model ('npar').")
      n <- length(na.omit(obs))
      p <- npar
    }  # end if model.provided else
 
    D2 <- 1 - ((n - 1) / (n - p)) * (1 - D2)
  }  # end if adjust
 
  return (D2)
}

[presented with Pretty R]

Usage examples:

Dsquared(model = mymodel)

Dsquared(model = mymodel, adjust = TRUE)

Dsquared(obs = myobs, pred= mypred, family = “poisson”)

Dsquared(obs = myobs, pred = mypred, family = “poisson”, adjust = TRUE, npar = 5)

References

Barbosa A.M., Brown J.A. & Real R. (2014) modEvA – an R package for model evaluation and analysis. R package, version 0.1.

Guisan, A. & Zimmermann, N.E. (2000). Predictive habitat distribution models in ecology. Ecological Modelling 135: 147-186

Weisberg, S. (1980). Applied Linear Regression. Wiley, New York

Model calibration with the Hosmer & Lemeshow goodness-of-fit statistic

The model evaluation functions posted here so far calculate only measures of discrimination capacity, i.e., how well the model is capable of distinguishing presences from absences (after the model’s continuous predictions of presence probability or alike are converted to binary predictions of presence or absence). However, there is another important facet of model evaluation: calibration or reliability, i.e., the relationship between predicted probability and observed prevalence (Pearce & Ferrier 2000; Jiménez-Valverde et al. 2013). The HLfit function, now included in the modEvA package (Barbosa et al. 2014), measures model reliability with the Hosmer & Lemeshow goodness-of-fit statistic (Hosmer & Lemeshow 1980). It also calculates the root mean square error (RMSE).

Note that these statistics have some limitations (see e.g. this post at Statistical Horizons), mainly due to the need to group the values into bins within which to compare probability and prevalence, and the influence of the binning method on the result. The HLfit function can use several binning methods, which are implemented in the getBins function provided with it (and also included in modEvA). You should therefore evaluate your model with different binning methods, to see if the results change significantly. The HLfit  function uses also the prevalence function.

bin.methods c("round.prob", "prob.bins", "size.bins", "n.bins", "quantiles")
 
getBins function(obs = NULL, pred = NULL, model = NULL, id = NULL, bin.method = "quantiles", n.bins = 10, fixed.bin.size = FALSE, min.bin.size = 15, min.prob.interval = 0.1, simplif = FALSE) {
 
  # version 3.4 (11 Sep 2014)
  # obs: a vector of observed presences (1) and absences (0) or another binary response variable
  # pred: a vector with the corresponding predicted values of presence probability, habitat suitability, environmental favourability or alike
  # model: instead of (and overriding) obs and pred, you can provide a model object of class "glm"
  # id: optional vector of row identifiers; must be of the same length and in the same order of 'obs' and 'pred' (or of the cases used to build 'model')
 
  if (!is.null(model)) {
    if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.")
    if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.")
    obs }  # end if model
 
  stopifnot(
    length(obs) == length(pred),
    !(NA %in% obs),
    !(NA %in% pred),
    obs %in% c(0, 1),
    pred >= 0,
    pred <= 1,
    is.null(id) | length(id) == length(pred),
    n.bins >= 2,
    min.bin.size >= 0,
    min.prob.interval > 0,
    min.prob.interval < 1
  )
 
  if(!(bin.method %in% modEvAmethods("getBins"))) stop("Invalid bin.method; type modEvAmethods('getBins') for available options.")
 
  N length(obs)
 
  # now get prob.bin (probability bin to which each point belongs):
  if (bin.method == "round.prob") {  # flaw: assymmetric (up to 11 bins, the first and last with half the prob range)
    message("Arguments n.bins and min.bin.size are ignored by this bin.method")
    prob.bin round(pred, digits = nchar(min.prob.interval) - 2)
  }  # end method round.prob
 
  if (bin.method == "prob.bins") {  # flaw: may generate empty or very small bins
    message("Arguments n.bins and min.bin.size are ignored by this bin.method")
    bin.cuts seq(from = 0, to = 1, by = min.prob.interval)
    prob.bin findInterval(pred, bin.cuts)
  }  # end method prob.bins
 
  if (bin.method == "size.bins") {
    message("Arguments n.bins and min.prob.interval are ignored by this bin.method")
    bin.method "n.bins"
    n.bins floor(N / min.bin.size)
    fixed.bin.size TRUE
  }  # end method size.bins
 
  if (bin.method == "n.bins") {   # note: bins do not have constant size (and some get less than min size)
    message("Argument min.prob.interval is ignored by this bin.method")
    if (fixed.bin.size) {
      prob.bin findInterval(pred, quantile(pred, probs = seq(from = 0, to = 1, by = 1 / n.bins)))
    } else {
      prob.bin cut(pred, n.bins)
    }
  }  # end method n.bins
 
  if (bin.method == "quantiles") {
    # quantile binning based on 'hosmerlem' function by Peter D. M. Macdonald (http://www.stat.sc.edu/~hitchcock/diseaseoutbreakRexample704.txt)
    cutpoints quantile(pred, probs = seq(0, 1, 1/n.bins))
    prob.bin findInterval(pred, cutpoints)
  }
 
  prob.bins sort(unique(prob.bin))  # lists the probability bins in the model
 
  bins.table t(as.data.frame(unclass(table(obs,prob.bin))))
  bins.table data.frame(rowSums(bins.table), bins.table[ , c(2, 1)])
  colnames(bins.table) c("N","N.pres","N.abs")
  bins.table$prevalence with(bins.table, N.pres / N)
  bins.table$mean.prob tapply(pred, prob.bin, mean)
  bins.table$median.prob tapply(pred, prob.bin, median)
  bins.table$difference with(bins.table, mean.prob - prevalence)
  bins.table$max.poss.diff ifelse(bins.table$mean.prob < 0.5, 1 - bins.table$mean.prob, bins.table$mean.prob)
  bins.table$adj.diff with(bins.table, abs(difference - max.poss.diff))
  bins.table$overpred apply(bins.table[ ,c("prevalence", "mean.prob")], 1, max)
  bins.table$underpred apply(bins.table[ ,c("prevalence", "mean.prob")], 1, min)
 
  bins.table [bins.table$N > 0, ]  # eliminates empty bins (which impede calculations)
 
  if(min(as.data.frame(bins.table)$N) < 15) warning("There is at least one bin with less than 15 values, for which comparisons may not be meaningful; consider using a bin.method that allows defining a minimum bin size")
  n.bins nrow(bins.table)
 
  return(list(prob.bin = prob.bin, bins.table = bins.table, N = N, n.bins = n.bins))
 
}
HLfit <- function (model = NULL, obs = NULL, pred = NULL, bin.method, n.bins = 10, fixed.bin.size = FALSE, min.bin.size = 15, min.prob.interval = 0.1, simplif = FALSE, alpha = 0.05, plot = TRUE, plot.values = TRUE, plot.bin.size = TRUE, xlab = "Predicted probability", ylab = "Observed prevalence", ...) {
  # version 1.5 (24 Jun 2015)
 
  if (!is.null(model)) {
    if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.")
    if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.")
    obs <- model$y
    pred <- model$fitted.values
  }  # end if model
 
  stopifnot(
    length(obs) == length(pred),
    obs %in% c(0, 1),
    pred >= 0,
    pred <= 1
  )
 
  bins <- getBins(obs = obs, pred = pred, bin.method = bin.method, n.bins = n.bins, fixed.bin.size = fixed.bin.size, min.bin.size = min.bin.size, min.prob.interval = min.prob.interval)
  n.bins <- nrow(bins$bins.table)
 
  # next 4 lines: adapted from hosmerlem function in http://www.stat.sc.edu/~hitchcock/diseaseoutbreakRexample704.txt
  observed <- xtabs(cbind(1 - obs, obs) ~ bins$prob.bin)
  expected <- xtabs(cbind(1 - pred, pred) ~ bins$prob.bin)
  chi.sq <- sum((observed - expected) ^ 2 / expected)
  p.value <- 1 - pchisq(chi.sq, df = n.bins - 2)
  rmse <- sqrt(mean((observed - expected) ^ 2))
 
  if (simplif) return(list(chi.sq = chi.sq, p.value = p.value, RMSE = rmse))
 
  # plotting loosely based on calibration.plot function in package PresenceAbsence
  if (plot) {
    N.total <- tapply(obs, bins$prob.bin, length)  # N cases in each bin
    N.presence <- tapply(obs, bins$prob.bin, sum)  # N presences in each bin
    Empty <- is.na(N.total)
    N.total[is.na(N.total)] <- 0
    N.presence[is.na(N.presence)] <- 0
    OBS.proportion <- N.presence / N.total
    OBS.proportion[Empty] <- NA
    df1.low <- 2 * (N.total - N.presence + 1)
    df2.low <- 2 * N.presence
    df1.up <- 2 * (N.presence + 1)
    df2.up <- 2 * (N.total - N.presence)
    N.bins <- length(unique(bins$prob.bin))  # fui eue
    Lower <- rep(0, N.bins)
    Upper <- rep(1, N.bins)
    TF <- N.presence != 0
    Lower[TF] <- N.presence[TF]/(N.presence[TF] + ((N.total[TF] - N.presence[TF] + 1) * qf(1 - alpha/2, df1.low[TF], df2.low[TF])))
    Lower[Empty] <- NA
    TF <- N.presence < N.total
    Upper[TF] <- ((N.presence[TF] + 1) * qf(1 - alpha/2, df1.up[TF], df2.up[TF]))/(N.total[TF] - N.presence[TF] + ((N.presence[TF] + 1) * qf(1 - alpha/2, df1.up[TF], df2.up[TF])))
    Upper[Empty] <- NA
    plot(c(-0.05, 1.05), c(-0.05, 1.05), type = "n", xlab = xlab, ylab = ylab, ...)
    bin.centers <- bins$bins.table$median.prob  # fui eue
 
    if (plot.bin.size) {
      text(bin.centers, Upper + 0.07, labels = N.total)
    }
 
    abline(a = 0, b = 1, lty = 2)
    for (i in 1:N.bins) {
      lines(rep(bin.centers[i], 2), c(Lower[i], Upper[i]))
    }
    points(bin.centers, OBS.proportion, pch = 20)
 
    if (plot.values) {
      text(1, 0.2, adj = 1, substitute(paste(HL == a), list(a = round(chi.sq, 1))))
      if (p.value < 0.001) {
        text(1, 0.1, adj = 1, substitute(paste(italic(p) < a), list(a = 0.001)))
      } else {
        text(1, 0.1, adj = 1, substitute(paste(italic(p) == a), list(a = round(p.value, 3))))
      }
      text(1, 0.0, adj = 1, substitute(paste(RMSE == a), list(a = round(rmse, 1))))
    }  # end if plot values
  }
 
  BinPred <- tapply(pred, bins$prob.bin, mean)
  bins.table <- data.frame(BinCenter = bin.centers, NBin = N.total, BinObs = OBS.proportion, BinPred = BinPred, BinObsCIlower = Lower, BinObsCIupper = Upper)
 
  return(list(bins.table = bins.table, chi.sq = chi.sq, DF = n.bins - 2, p.value = p.value, RMSE = rmse))
}

[presented with Pretty R]

HLfit still needs some code simplification, but fixes have recently been applied to the getBins function so that it doesn’t fail when probability breakpoints are not unique. Additional model calibration measures are soon to be added to the package.

Usage examples:

HLfit(model = mymodel, bin.method = “prob.bins”, min.prob.interval = 0.1)

HLfit(obs= myobs, pred = mypred, bin.method = “n.bins”, n.bins = 10)

HLfit(obs= myobs, pred = mypred, bin.method = “quantiles”)

References:

Barbosa A.M., Brown J.A. & Real R. (2014) modEvA – an R package for model evaluation and analysis. R package, version 0.1.

Hosmer, D.W. & Lemeshow, S. (1980). A goodness-of-fit test for the multiple logistic regression model. Communications in Statistics, A10: 1043–1069

Jiménez-Valverde, A., Acevedo, P., Barbosa, A.M., Lobo, J.M. & Real, R. (2013). Discrimination capacity in species distribution models depends on the representativeness of the environmental domain. Global Ecology and Biogeography, 22: 508–516

Pearce, J. & Ferrier, S. (2000). Evaluating the Predictive Performance of Habitat Models Developed using Logistic Regression. Ecological Modeling, 133: 225 – 245

Optimization of model thresholds based on a pair of measures

The optiPair function, now included in the modEvA package (Barbosa et al. 2014), follows the optiThresh, threshMeasures and evaluate functions, which should all be loaded along; and extends (thus replacing) the SensSpecCurve function (now deprecated), in the sense that optiPair can be used with any pair of model evaluation measures that balance each other, such as sensitivity-specificity, omission-commission, or underprediction-overprediction (featured in Barbosa et al. 2013). The function plots both measures in the given pair against all thresholds, and calculates the optimal sum, difference and mean of the two measures.

SensSpec

optiPair <-
function (obs = NULL, pred = NULL, model = NULL, 
          measures = c("Sensitivity", "Specificity"), 
          interval = 0.01, plot = TRUE, plot.sum = FALSE, 
          plot.diff = FALSE, ylim = NULL, ...) {
  # version 1.5 (3 Apr 2014)
  # obs: a vector of observed presences (1) and absences (0) or another binary response variable
  # pred: a vector with the corresponding predicted values of presence probability, habitat suitability, environmental favourability or alike
  # model: instead of (and overriding) obs and pred, a model object of class "glm"
  # measures: the pair of measures whose curves to plot, e.g. c("Sensitivity", "Specificity")
  # interval: the interval of thresholds at which to calculate sensitivity and specificity
  # plot: logical indicating whether or not to plot the measures
  # plot.sum and plot.diff will optionally plot the sum (+) and/or the difference (-) between both measures in the pair
  # ... : additional arguments to be passed to the "plot" function
 
  if(length(measures) != 2) stop ("'measures' must contain two elements.")
 
  if(is.null(model)) {
    if (is.null(obs) | is.null(pred)) stop("You must provide either the 'obs' 
and 'pred' vectors, or a 'model' object of class 'glm'")
  }  # end if null model
  else {
    if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.")
    if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.")
    obs <- model$y
    pred <- model$fitted.values
    model <- NULL  # so the message is not repeated for each threshold
  }  # end if model
 
  if (NA %in% obs | NA %in% pred) stop("Please remove (rows with) NA 
                                       from your data")
  if (length(obs) != length(pred)) stop("'obs' and 'pred' must have the same 
                                        number of values (and in the same order)")
  if (!all(obs %in% c(0, 1))) stop ("'obs' must consist of binary observations 
                                    of 0 or 1")
  if (any(pred < 0 | pred > 1)) stop ("'pred' must range between 0 and 1")
 
  measures.values <- optiThresh(obs = obs, pred = pred, interval = interval, measures = measures, optimize = "each", simplif = TRUE)
  measures.values$Difference <- abs(measures.values[ , 1] - measures.values[ , 2])
  measures.values$Sum <- rowSums(measures.values[ , 1:2])
  measures.values$Mean <- rowMeans(measures.values[ , 1:2])
  measures.values$Threshold <- as.numeric(rownames(measures.values))
 
  MinDiff <- min(measures.values$Difference)
  MaxSum <- max(measures.values$Sum)
  MaxMean <- max(measures.values$Mean)
 
  ThreshDiff <- with(measures.values, Threshold[which.min(Difference)])
  ThreshSum <- with(measures.values, Threshold[which.max(Sum)])
  ThreshMean <- with(measures.values, Threshold[which.max(Mean)])
 
  if (plot) {
 
    finite <- as.matrix(measures.values[ , 1:2])
    finite <- finite[is.finite(finite)]
    if(is.null(ylim)) {
      if(plot.sum) ylim <- c(min(finite), MaxSum)
      else ylim <- c(min(finite), max(finite))
    }  # end if null ylim
 
    plot(measures.values[ , 1] ~ measures.values$Threshold, pch = 20, xlab = "Threshold", ylab = measures, ylim = ylim, ...)
    # title(ylab = measures, col.lab = c("black", "grey"))  # error: "col.lab" has the wrong length
    mtext(measures[1], side = 2, line = 3)
    points(measures.values[ , 2] ~ measures.values$Threshold, pch = 20, col = "darkgrey")
    mtext(measures[2], side = 2, line = 2, col = "darkgrey")
 
    abline(h = measures.values[which.min(measures.values$Difference), 1], col = "lightgrey", lty = 2)
    abline(v = ThreshDiff, col = "lightgrey", lty = 2)
 
    if (plot.sum) {
      with(measures.values, points(Sum ~ Threshold, pch = '+'))
      abline(v = ThreshSum, col = "lightgrey", lty = 2)
      abline(h = MaxSum, col = "lightgrey", lty = 2)
    }    # end if plot sum
 
    if (plot.diff) {
      with(measures.values, points(Difference ~ Threshold, pch = '-', col = "grey"))
      abline(v = ThreshDiff, col = "grey", lty = 2)
      abline(h = MinDiff, col = "grey", lty = 2)
    }  # end if plot diff
  }  # end if plot
 
  return(list(measures.values = measures.values, MinDiff = MinDiff, ThreshDiff = ThreshDiff, MaxSum = MaxSum, ThreshSum = ThreshSum, MaxMean = MaxMean, ThreshMean = ThreshMean))
 
}

[Created by Pretty R at inside-R.org]

To use it, just load the required functions and replace the names of your data frame, presence-absence data and predicted values (or else use a “glm” model object directly) in either of the folowing commands:

optiPair(obs = mydata$myspecies, pred = mydata$myspecies_P, measures = c(“Sensitivity”, “Specificity”))

optiPair(model = mymodel, measures = c(“UPR”, “OPR”))

References

Barbosa A.M., Brown J.A. & Real R. (2014) modEvA – an R package for model evaluation and analysis. R package, version 0.1.

Barbosa, A.M., Real, R., Muñoz, A.-R. & Brown, J.A. (2013) New measures for assessing model equilibrium and prediction mismatch in species distribution models. Diversity and Distributions, online early (DOI: 10.1111/ddi.12100).

Sensitivity-specificity curves

**NOTE: this function is now deprecated; please use the optiPair function instead**

The SensSpecCurve function calculates and plots sensitivity (black circles) and specificity (white circles) for all thresholds at a given interval. The plot marks the threshold and the sensitivity value at which sensitivity and specificity are closest to each other. Optionally, you can plot also the difference (minuses), sum (pluses), and/or mean (grey line) of sensitivity and specificity, together with their optimal values and respective thresholds. All these values are also provided with the text output of the function.

SensSpecCurve <- function (obs, pred, interval = 0.01, plot = TRUE, plot.diff = FALSE, plot.sum = FALSE, plot.mean = FALSE, ...) {
# obs: a vector of observed presences (1) and absences (0) or another binary response variable
# pred: a vector with the corresponding predicted values of presence probability, habitat suitability, environmental favourability or alike
# interval: the interval of thresholds at which to calculate sensitivity and specificity
# plot: logical indicating whether or not to plot the curves
# plot.sum and plot.diff will optionally plot the sum (+) and difference (-) between sensitivity and specificity
# ... : arguments to be passed to the "plot" function

  SensSpec <- optiThresh(obs = obs, pred = pred, interval = interval, measures = c("Sensitivity", "Specificity"), plot = FALSE, simplif = TRUE)
  SensSpec$Difference <- abs(SensSpec$Sensitivity - SensSpec$Specificity)
  SensSpec$Sum <- rowSums(SensSpec[,1:2])
  SensSpec$Mean <- rowMeans(SensSpec)
  SensSpec$Threshold <- as.numeric(rownames(SensSpec))

  MinDiff <- min(SensSpec$Difference)
  MaxSum <- max(SensSpec$Sum)
  MaxMean <- max(SensSpec$Mean)

  ThreshDiff <- with(SensSpec, Threshold[which.min(Difference)])
  ThreshSum <- with(SensSpec, Threshold[which.max(Sum)])
  ThreshMean <- with(SensSpec, Threshold[which.max(Mean)])

  if(plot){
    ymax <- ifelse(plot.sum, max(SensSpec$Sum), 1)

    with(SensSpec, plot(Sensitivity ~ Threshold, pch = 20, ylim = c(0, ymax), ylab = "Sensitivity and specificity", ...))
    with(SensSpec, points(Specificity ~ Threshold, pch = 1))
    abline(h = SensSpec[which.min(SensSpec$Difference), "Sensitivity"], col = "grey", lty = 2)
    abline(v = ThreshDiff, col = "grey", lty = 2)

    if(plot.mean) {
      with(SensSpec, lines(Mean ~ Threshold, col = "grey"))
      abline(v = ThreshMean, col = "grey", lty = 2)
      abline(h = MaxMean, col = "grey", lty = 2)
    }  # end if plot mean

    if(plot.sum) {
      with(SensSpec, points(Sum ~ Threshold, pch = '+'))
      abline(v = ThreshSum, col = "grey", lty = 2)
      abline(h = MaxSum, col = "grey", lty = 2)
    }    # end if plot sum

    if(plot.diff) {
      with(SensSpec, points(Difference ~ Threshold, pch = '-', col = "grey"))
      abline(v = ThreshDiff, col = "grey", lty = 2)
      abline(h = MinDiff, col = "grey", lty = 2)
    }  # end if plot diff
  }  # end if plot

  return(list(SensSpec = SensSpec, MinDiff = MinDiff, ThreshDiff = ThreshDiff, MaxSum = MaxSum, ThreshSum = ThreshSum, MaxMean = MaxMean, ThreshMean = ThreshMean))

}  # end SensSpecCurve function

[presented with Pretty R]

This function uses the optiThresh function, which in turn uses threshMeasures and evaluate, so all these must be loaded along with SpensSpecCurve for it to work. Then just load your observed and predicted data (see Beginner’s guide for more information) and type or paste:

with(mydata, SpensSpecCurve(obs = myspecies, pred = myspecies_P))

Optimised thresholds for model evaluation

The optiThresh function, now included in the modEvA package (Barbosa et al. 2014), calculates optimal thresholds for a number of model evaluation measures (see “Threshold-based measures of model evaluation“). Optimization is given for each measure, and/or for all measures according to particular criteria (e.g. Jiménez-Valverde & Lobo 2007, Nenzén & Araújo 2011). Results are given numerically and in plots:

The optiThresh function depends on the functions provided in the previous post, so all must be loaded for this one to work.

thresh.criteria = c("each", "preval", "0.5", "maxKappa", "minSensSpecDiff", "maxSensSpecSum", "maxTSS")  # the last two are equivalent

optiThresh <- function(obs = NULL, pred = NULL, model = NULL, interval = 0.01, measures = threshMeasures.methods[-1], optimize = thresh.criteria, simplif = FALSE,  plot = TRUE, sep.plots = FALSE, xlab = "Threshold", ...) {
  # version 2.6 (20 Jan 2013)
  # obs: a vector of observed presences (1) and absences (0) or another binary response variable
  # pred: a vector with the corresponding predicted values of presence probability, habitat suitability, environmental favourability or alike
  # model: instead of (and overriding) obs and pred, you can provide a model object of class "glm"
  # optimize: the threshold optimization criteria to use; "each" optimizes the threshold for each model evaluation measure, while the remaining options optimize all measures according to the specified criterion
  # interval: the interval of thresholds at which to calculate evaluation measures
  # measures: model evaluation measures for which to calculate the optimal thresholds
  # plot: logical indicating whether or not to plot the values of each evaluation measure at all thresholds
  # sep.plots: logical. If TRUE, each plot is presented separately (you need to be recording R plot history to be able to browse through them all); if FALSE (the default), all plots are presented together in the same window
  # ...: additional arguments to be passed to the "plot" function
  # Note: some measures cannot be calculated for thresholds at which there are zeros in the confusion matrix, hence the eventual 'NaN' or 'Inf' in results. Also, optimization may be deceiving for some measures; use plot = TRUE and inspect the plot(s).

  if (!is.null(model)) {
    if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.")
    if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.")
    obs <- model$y
    pred <- model$fitted.values
    model <- NULL  # so the message is not repeated for each threshold
  }  # end if model

  input.measures <- measures

  if ("minSensSpecDiff" %in% optimize | "maxSensSpecSum" %in% optimize) {
    if (!("Sensitivity" %in% measures)) {
      measures <- c(measures, "Sensitivity")
    }  # end if !Sensitivity
    if (!("Specificity" %in% measures)) {
      measures <- c(measures, "Specificity")
    }  # end if !Specificity
  }  # end if minSensSpecDiff

  if("maxKappa" %in% optimize & !("kappa" %in% measures)) {
    measures <- c(measures, "kappa")
  }

  if("maxTSS" %in% optimize & !("TSS" %in% measures)) {
    measures <- c(measures, "TSS")
  }

  thresholds <- seq(from = 0, to = 1, by = interval)
  Nthresholds <- length(thresholds)
  Nmeasures <- length(measures)
  all.thresholds <- data.frame(matrix(data = NA, nrow = Nthresholds, ncol = Nmeasures), row.names = thresholds)
  colnames(all.thresholds) = measures

  for (t in 1 : Nthresholds) for (m in 1 : Nmeasures) {
    all.thresholds[t, m] <- threshMeasures(obs = obs, pred = pred, thresh = thresholds[t], measures = measures[m], standardize = FALSE, simplif = TRUE)
  }  # end for t for m

  if (simplif) {  # shorter version for use with e.g. the optiPair function
    return(all.thresholds)
  }  # end if simplif
  else {
    results <- list(all.thresholds = all.thresholds)  # start a list of results

    input.optimize <- optimize
    if (plot == TRUE & !("each" %in% optimize)) optimize <- c("each", optimize)

    if ("each" %in% optimize) {
      optimals.each <- data.frame(matrix(data = NA, nrow = Nmeasures, ncol = 4))
      colnames(optimals.each) <- c("measure", "threshold", "value", "type")
      optimals.each[1] <- measures
      goodness.measures <- c("CCR", "Sensitivity", "Specificity", "PPP", "NPP", "kappa", "TSS", "NMI", "OddsRatio")
      badness.measures <- c("Omission", "Commission", "Misclass", "UPR", "OPR")
      change.measures <- c("PPI", "PAI")

      for (m in 1 : Nmeasures) {
        if (measures[m] %in% (goodness.measures)) {  # optimal is maximum
          optimals.each[m, "threshold"] <- as.numeric(rownames(all.thresholds)[which.max(all.thresholds[ , m])])
          optimals.each[m, "value"] <- max(all.thresholds[ , m], na.rm = TRUE)
          optimals.each[m, "type"] <- "maximum"
        }  # end if measure in goodness
        else {
          if (measures[m] %in% (badness.measures)) {  # optimal is minimum
            optimals.each[m, "threshold"] <- as.numeric(rownames(all.thresholds)[which.min(all.thresholds[, m])])
            optimals.each[m, "value"] <- min(all.thresholds[ ,m], na.rm = TRUE)
            optimals.each[m, "type"] <- "minimum"
          }  # end if measure in badness
          else {
            if (measures[m] %in% (change.measures)) {  # optimal is closest to zero
              optimals.each[m, "threshold"] <- as.numeric(rownames(all.thresholds)[which.min(abs(all.thresholds[ , m]))])
              optimals.each[m, "value"] <- min(abs(all.thresholds[ , m]), na.rm = TRUE)
              optimals.each[m, "type"] <- "closest to zero"
            }  # end if measure in change
          }  # end 2nd else
        }  # end 1st else
      }  # end for m
      if ("each" %in% input.optimize)  results <- c(results, optimals.each = list(optimals.each))  # add this to results
    }  # end if each

    criteria <- optimize[optimize != "each"]
    Ncriteria <- length(criteria)

    if (Ncriteria > 0) {

      optimals.criteria <- data.frame(matrix(data = NA, nrow = Nmeasures, ncol = Ncriteria))
      rownames(optimals.criteria) <- measures
      colnames(optimals.criteria) <- criteria

      if ("preval" %in% criteria) {
        for (m in 1 : Nmeasures) {
          optimals.criteria[m, "preval"] <- threshMeasures(obs = obs, pred = pred, thresh = "preval", measures = measures[m], standardize = FALSE, simplif = TRUE)
        }
      }  # end if preval

      if ("minSensSpecDiff" %in% criteria) {
        all.thresholds$SensSpecDiff <- with(all.thresholds, abs(Sensitivity - Specificity))
        minSensSpecDiff <- thresholds[which.min(all.thresholds$SensSpecDiff)]
        for (m in 1:Nmeasures) {
          optimals.criteria[m, "minSensSpecDiff"] <- threshMeasures(obs = obs, pred = pred, thresh = minSensSpecDiff, measures = measures[m], standardize = FALSE, simplif = TRUE)
        }
      }

      if ("maxSensSpecSum" %in% criteria) {
        all.thresholds$SensSpecSum <- with(all.thresholds, Sensitivity + Specificity)
        maxSensSpecSum <- thresholds[which.max(all.thresholds$SensSpecSum)]
        for (m in 1 : Nmeasures) {
          optimals.criteria[m, "maxSensSpecSum"] <- threshMeasures(obs = obs, pred = pred, thresh = maxSensSpecSum, measures = measures[m], standardize = FALSE, simplif = TRUE)
        }
      }

      if ("maxKappa" %in% criteria) {
        if (!("kappa" %in% measures)) {
          for (t in 1 : Nthresholds) {
            all.thresholds$kappa <- threshMeasures(obs = obs, pred = pred, thresh = thresholds[t], measures = "kappa", standardize = FALSE, simplif = TRUE)
          }
        }
        maxKappa <- thresholds[which.max(all.thresholds$kappa)]
        for (m in 1 : Nmeasures) {
          optimals.criteria[m, "maxKappa"] <- threshMeasures(obs = obs, pred = pred, thresh = maxKappa, measures = measures[m], standardize = FALSE, simplif = TRUE)
        }
      }

      if ("maxTSS" %in% criteria) {
        if (!("TSS" %in% measures)) {
          for (t in 1 : Nthresholds) {
            all.thresholds$TSS <- threshMeasures(obs = obs, pred = pred, thresh = thresholds[t], measures = "TSS", standardize = FALSE, simplif = TRUE)
          }
        }
        maxTSS <- thresholds[which.max(all.thresholds$TSS)]
        for (m in 1 : Nmeasures) {
          optimals.criteria[m, "maxTSS"] <- threshMeasures(obs = obs, pred = pred, thresh = maxTSS, measures = measures[m], standardize = FALSE, simplif = TRUE)
        }
      }

      if ("0.5" %in% criteria) {
        for (m in 1 : Nmeasures) {
          optimals.criteria[m, "0.5"] <- all.thresholds[rownames(all.thresholds) == 0.5, m]
        }
      }

      results <- c(results, optimals.criteria = list(optimals.criteria))  # add this to results
    }  # end if Ncriteria > 0

    if (plot) {
      opar <- par(no.readonly = TRUE)
      on.exit(par(opar))
      n.input.measures <- length(input.measures)
      if (sep.plots) {
        par(mfrow = c(1,1))
      } else {
        if (n.input.measures > 4)  par(mar = c(1, 4.5, 1, 1))
        root <- sqrt(n.input.measures)
        par(mfrow = c(ceiling(root), round(root)))
      }  # end if sep.plots else
      for (m in 1 : n.input.measures) {
        plot(all.thresholds[ , m] ~ thresholds, ylab = input.measures[m], ...)
        if ("each" %in% input.optimize) {
          abline(v = optimals.each[m, "threshold"], col = "grey", lty = 2)  # vertical line on optimal threshold
          abline(h = optimals.each[m, "value"], col = "grey", lty = 2)  # horiz line on optimal value
        }
     }  # end for m
    }  # end if plot

    return(results)

  }  # end if simplif else
}  # end optiThresh function

[presented with Pretty R]

References:

Barbosa A.M., Brown J.A. & Real R. (2014) modEvA – an R package for model evaluation and analysis. R package, version 0.1.

Jiménez-Valverde A. & Lobo J.M. (2007) Threshold criteria for conversion of probability of species presence to either–or presence–absence. Acta Oecologica, 31: 361–369.

Nenzén H.K. & Araújo M.B. (2011) Choice of threshold alters projections of species range shifts under climate change. Ecological Modelling, 222: 3346–3354.