Threshold-based measures of model evaluation

A number of measures may be used to evaluate the predictions of species distribution (or ecological niche, or bioclimatic envelope…) models against observed presence-absence data (Fielding & Bell 1997; Liu et al. 2011; Barbosa et al. 2013). The threshMeasures function, now included in the modEvA package (Barbosa et al. 2014), calculates a few of these measures. Input data are either a model object of class “glm”, or a vector of the observed the presences-absences (1-0) of your species plus another with the corresponding model predictions, and the threshold value to separate predicted presences from predicted absences in the model predictions. By default this threshold is 0.5, but you may want to change it depending on a number of criteria (see e.g. Jiménez-Valverde & Lobo 2007, Nenzén & Araújo 2011). You can set thresh to “preval” (species’ prevalence or proportion of presences in the input “obs” data) or to any numerical value between 0 and 1. You can calculate optimal threshold values according to several criteria with the optiThresh function provided in the next post. The results of threshMeasures are given numerically and in a bar plot:


# version 2.5 (20 Jan 2013)

threshMeasures.methods = c("AUC", "CCR", "Misclass", "Sensitivity", "Specificity", "Omission", "Commission", "PPP", "NPP", "UPR", "OPR", "PPI", "PAI", "kappa", "TSS", "NMI", "OddsRatio")

evaluate <- function(a, b, c, d, N = NULL, measure = "CCR"){
  # version 1.1 (24 Jul 2013)
  # internal function to be used by others in the package
  # a, b, c, d: elements of the confusion matrix (TP, FP, FN, TN)
  # N: sample size (total number of observations); calculated automatically if NULL
  # measure: evaluation measure to use
  # note: TSS and NMI are not symmetrical ("obs" vs "pred" != "pred" vs "obs")
  # note that some measures (e.g. NMI, odds ratio) don't work with zeros in (some parts of) the confusion matrix
  if(is.null(N))  N <- a + b + c + d
  stopifnot(N == a + b + c + d)
  if(measure == "CCR") { value <- (a+d)/N
  } else if(measure == "Sensitivity") { value <- a/(a+c)
  } else if(measure == "Specificity") { value <- d/(b+d)
  } else if(measure == "Omission") { value <- c/(a+c)
  } else if(measure == "Commission") { value <- b/(b+d)
  } else if(measure == "PPP") { value <- a/(a+b)  # also called "Precision"
  } else if(measure == "NPP") { value <- d/(c+d)
  } else if(measure == "Misclass") { value <- (b+c)/N
  } else if(measure == "UPR") { value <- c/(c+d)  # this and next 3: Barbosa et al. 2013 Diversity and Distributions
  } else if(measure == "OPR") { value <- b/(a+b)
  } else if(measure == "PPI") { value <- ((a+b)/(a+c))-1
  } else if(measure == "PAI") { value <- ((c+d)/(b+d))-1
  } else if(measure == "kappa") { value <- ((a+d)-(((a+c)*(a+b)+(b+d)*(c+d))/N))/(N-(((a+c)*(a+b)+(b+d)*(c+d))/N))
  } else if(measure == "TSS") { value <- (a*d - b*c) / ((a+c) * (b+d))
  } else if(measure == "NMI") { value <- 1-((-a*log(a)-b*log(b)-c*log(c)-d*log(d)+(a+b)*log(a+b)+(c+d)*log(c+d))/(N*log(N)-((a+c)*log(a+c)+(b+d)*log(b+d))))
# NMI by Forbes (1995); "1-" missing in Fielding & Bell 1997 (and Manel et al 2001) but Fielding confirmed it was a typo
  } else if(measure == "OddsRatio") { value <- (a*d)/(c*b)
# (c*b)/(a*d)  # inverse, would give a (more expectable) unimodal plot of odds against thresholds
  } else stop("Invalid measure; type 'threshMeasures.methods' for available options.")
} # end evaluate function

threshMeasures <- function(obs = NULL, pred = NULL, model = NULL, thresh = 0.5, measures = threshMeasures.methods, simplif = FALSE, plot = TRUE, plot.ordered = FALSE, standardize = TRUE, messages = TRUE, ...) {
  # version 2.5 (20 Jan 2013)
  # obs: a vector of observed presences (1) and absences (0) or another binary response variable
  # pred: a vector with the corresponding predicted values of presence probability, habitat suitability, environmental favourability or alike
  # model: instead of (and overriding) obs and pred, you can provide a model object of class "glm"
  # thresh: threshold value to separate predicted presences from predicted absences in 'pred'; may be "preval" or any real number between 0 and 1
  # measures: character vector of the evaluation measures to use
  # plot: logical, whether or not to produce a barplot of the calculated measures
  # plot.ordered: logical, whether to plot the measures in decreasing order rather than in input order
  # standardize: logical, whether to change measures that may range between -1 and +1 (namely kappa and TSS) to their corresponding value in the 0-to-1 scale, so thet they can compare directly to other measures
  # ...: arguments to be passed to the barplot function (if plot = TRUE)

  if(is.null(model)) {
    if (is.null(obs) | is.null(pred)) stop("You must provide either the 'obs' and 'pred' vectors, or a 'model' object of class 'glm'")
  }  # end if null model
  else {
    if (!all(class(model) %in% c("glm", "lm"))) stop("'model' must be a model object of class 'glm'")
    if (messages) {
      if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.")
      if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.")
    obs <- model$y
    pred <- model$fitted.values
  }  # end if !null model

  if (NA %in% obs | NA %in% pred) stop("Please remove (rows with) NA from your data")
  if (length(obs) != length(pred)) stop("'obs' and 'pred' must have the same number of values (and in the same order)")
  if (!all(obs %in% c(0, 1))) stop ("'obs' must consist of binary observations of 0 or 1")
  if (standardize & !exists("standard01")) stop ("'standardize' requires the standard01 function; please load that function or use 'standardize = FALSE'")
  if (!(thresh == "preval" | is.numeric(thresh)))  stop("'thresh' must be either 'preval' or a numeric value between 0 and 1")
  if (thresh == "preval")  thresh <- prevalence(obs)

  obs0 <- obs == 0
  obs1 <- obs == 1
  pred0 <- pred < thresh
  pred1 <- pred >= thresh
  a <- sum(obs1 & pred1)
  b <- sum(obs0 & pred1)
  c <- sum(obs1 & pred0)
  d <- sum(obs0 & pred0)
  N <- a + b + c + d

  Nmeasures <- length(measures)
  measureValues <- as.vector(rep(NA, Nmeasures), mode = "numeric")

  for (i in 1:Nmeasures) {
    if (measures[i] == "AUC") measureValues[i] <- AUC(obs = obs, pred = pred, simplif = TRUE)
    else if (measures[i] %in% threshMeasures.methods) {
      measureValues[i] <- evaluate(a, b, c, d, N, measure = measures[i])
      if (standardize == TRUE  &  measures[i] %in% c("TSS", "kappa")) {
        measureValues[i] <- standard01(measureValues[i])
        measures[i] <- paste("s", measures[i], sep = "")
        message(measures[i], " standardized to the 0-1 scale for direct comparability with other measures; use 'standardize = FALSE' if this is not what you wish")
      }  # end if standardize
    }  # end if measures[i] in threshMeasures.methods
    else {
      warning("'",measures[i],"' is not a valid measure; type 'threshMeasures.methods' for available options\n\n")
    }  # end else
  }  # end for i

  Measures <- matrix(data = measureValues, nrow = Nmeasures, ncol = 1, dimnames = list(measures, "Value"))
  if (simplif) {  # shorter version for use with e.g. optiThresh function
  } else {
    prev <- prevalence(obs)
    conf.matrix <- matrix(c(a, b, c, d), nrow = 2, ncol = 2, byrow = TRUE, dimnames = list(c("pred1", "pred0"), c("obs1", "obs0")))
    if (plot) {
      names(measureValues) <- measures
      measures.plot <- measureValues
      if (plot.ordered) {
        measures.plot <- sort(measures.plot, decreasing = TRUE)
      barplot(measures.plot, las = 3, ...)
    }  # end if plot
    return(list(N = N, Prevalence = prev, Threshold = thresh, ConfusionMatrix = conf.matrix, ThreshMeasures = Measures))
  }  # end else
}  # end threshMeasures function

[presented with Pretty R]

Note that some of these measures (like NMI, UPR, OPR, PPP, NPP) cannot be calculated for thresholds at which there are zeros in the confusion matrix. Note also that, while most of these measures range from 0 to 1, some of them — such as kappa and TSS — may range from -1 to 1 (Allouche et al. 2006), so their raw scores are not exactly comparable. The threshMeasures function includes an option to standardize these measures to 0-1, for which you need the standard01 function, so that you obtain the standardized versions skappa and sTSS.

By default, the threshMeasures function calculates all the measures listed in threshMeasures.methods, but you may specify only a few measures if you wish, as in these examples:

# create some random test data:
myobs <- sample(c(0, 1), 150, replace = TRUE)
mypred <- runif(150, min = 0, max = 1)
# note: you can pratice with real sample data too

# calculate some threshold-based measures:
threshMeasures(obs = myobs, pred = mypred, thresh = 0.75, measures = c("Sensitivity", "Specificity", "kappa", "TSS", "UPR", "OPR"), standardize = TRUE, ylim = c(0, 1))

# instead of obs and pred, you can provide a model object of class "glm":
# mymodel = glm(myspecies ~ var1 + var2 ... ...)
threshMeasures(model = mymodel, thresh = "preval", measures = c("Sensitivity", "Specificity", "UPR", "OPR"), ylim = c(0, 1))

The threshMeasures function can also be used to calculate the agreement between different presence-absence (or other types of binary) data, as Barbosa et al. (2012) did for comparing mammal distribution data from atlas and range maps. Notice, however, that some of these measures, such as TSS or NMI, are not symmetrical (obs vs pred is different from pred vs obs).


Allouche, O., Tsoar, A. & Kadmon, R. (2006) Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS). Journal of Applied Ecology, 43, 1223–1232.

Barbosa A.M., Brown J.A. & Real R. (2014) modEvA – an R package for model evaluation and analysis. R package, version 0.1.

Barbosa A.M., Estrada A., Márquez A.L., Purvis A. & Orme C.D.L. (2012) Atlas versus range maps: robustness of chorological relationships to distribution data types in European mammals. Journal of Biogeography 39: 1391-1400

Barbosa A.M., Real R., Muñoz A.R. & Brown J.A. (2013) New measures for assessing model equilibrium and prediction mismatch in species distribution models. Diversity and Distributions 19: 1333-1338 (DOI: 10.1111/ddi.12100)

Fielding A.H. & Bell J.F. (1997). A review of methods for the assessment of prediction errors in conservation presence/absence models. Environmental Conservation 24: 38-49

Jiménez-Valverde A. & Lobo J.M. (2007) Threshold criteria for conversion of probability of species presence to either–or presence–absence. Acta Oecologica 31: 361–369

Liu C., White M., & Newell G. (2011) Measuring and comparing the accuracy of species distribution models with presence-absence data. Ecography, 34, 232–243.

Nenzén H.K. & Araújo M.B. (2011) Choice of threshold alters projections of species range shifts under climate change. Ecological Modelling 222: 3346-3354



Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s