Miller calibration statistics

The MillerCalib function, now included in the modEvA package (version 0.7.2 just released), calculates Miller’s (1991) calibration statistics for a logistic regression model, namely the intercept and slope of the regression of the binary response variable on the logit of predicted probabilities. Optionally and by default, it also plots the corresponding regression line (black) over the reference diagonal (grey dashed line).

Calibration (or reliability) measures how a model’s predicted probabilities relate to observed species prevalence or proportion of presences in the modelled data. If predictions are perfectly calibrated, the slope will equal 1 and the intercept will equal 0, so the model’s calibation line will perfectly overlap with the reference diagonal. Note that Miller’s statistics assess the model globally: a model is well calibrated if the average of all predicted probabilities equals the proportion of presences in the modelled data. Therefore, good calibration is always attained on the same data used for building the model (Miller 1991). The function is more useful to see if a model stays well calibrated when extrapolated to a new dataset.

MillerCalib <- function(model = NULL, obs = NULL, pred = NULL, plot = TRUE, plot.values = TRUE, digits = 4, xlab = "", ylab = "", main = "Miller calibration", ...) {
  # version 1.3 (24 Jun 2015)
 
  model.provided <- ifelse(is.null(model), FALSE, TRUE)
 
  if (model.provided) {
    if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.")
    if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.")
    obs <- model$y
    pred <- model$fitted.values
  } else { # if model not provided
    if (is.null(obs) | is.null(pred)) stop("You must provide either 'obs' and 'pred', or a 'model' object of class 'glm'")
  }
 
  stopifnot(
    length(obs) == length(pred),
    obs %in% c(0,1),
    pred >= 0,
    pred <= 1
  )
 
  logit <- log(pred / (1 - pred))
  mod <- glm(obs ~ logit, family = binomial)
  intercept <- mod$coef[[1]]
  slope <- mod$coef[[2]]
  std.err <- summary(mod)$coefficients["logit", "Std. Error"]
  slope.p <- abs((slope - 1) / sqrt(std.err^2 + 0))  # Paternoster 98; http://stats.stackexchange.com/questions/55501/test-a-significant-difference-between-two-slope-values
 
  if (plot) {
    ymin <- min(0, intercept)
    ymax <- max(1, intercept + 0.1)
    plot(c(0, 1), c(ymin, ymax), type = "n", xlab = xlab, ylab = ylab, main = main)
    abline(0, 1, col = "lightgrey", lty = 2)
    abline(intercept, slope)
    if (plot.values) {
      plotext <- paste("intercept =" , round(intercept, digits), "\nslope =", round(slope, digits), "\nslope p-value =", round(slope.p, digits))
      text(x = 0, y = ymax - 0.25, adj = 0, labels = plotext)
    }
  }
  return(list(intercept = intercept, slope = slope, slope.pvalue = slope.p))
}  # end MillerCalib function

[presented with Pretty R]

Input data can be either a glm model object with binomial family and logit link, or two vectors of observed binary (0 or 1) values and the corresponding predicted probabilities. The output is a named list with the calibration intercept and slope.

REFERENCES

Miller M.E., Hui S.L. & Tierney W.M. (1991) Validation techniques for logistic regression models. Statistics in Medicine, 10: 1213-1226

Wintle B.A., Elith J. & Potts J.M. (2005) Fauna habitat modelling and mapping: A review and case study in the Lower Hunter Central Coast region of NSW. Austral Ecology, 30: 719-738

Advertisements

R-squared measures for generalized linear models

There are several ways of calculating (pseudo) R-squared values for logistic regression models, with no consensus about which is best. The RsqGLM function, now included in the modEvA package, calculates those of McFadden (1974), Cox & Snell (1989), Nagelkerke (1991), Tjur (2009), and the squared Pearson correlation between observed and predicted values:

RsqGLM <- function(obs = NULL, pred = NULL, model = NULL) {
  # version 1.2 (3 Jan 2015)
 
  model.provided <- ifelse(is.null(model), FALSE, TRUE)
 
  if (model.provided) {
    if (!("glm" %in% class(model))) stop ("'model' must be of class 'glm'.")
    if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.")
    if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.")
    obs <- model$y
    pred <- model$fitted.values
 
  } else { # if model not provided
    if (is.null(obs) | is.null(pred)) stop ("You must provide either 'obs' and 'pred', or a 'model' object of class 'glm'")
    if (length(obs) != length(pred)) stop ("'obs' and 'pred' must be of the same length (and in the same order).")
    if (!(obs %in% c(0, 1)) | pred < 0 | pred > 1) stop ("Sorry, 'obs' and 'pred' options currently only implemented for binomial GLMs (binary response variable with values 0 or 1) with logit link.")
    logit <- log(pred / (1 - pred))
    model <- glm(obs ~ logit, family = "binomial")
  }
 
  null.mod <- glm(obs ~ 1, family = family(model))
  loglike.M <- as.numeric(logLik(model))
  loglike.0 <- as.numeric(logLik(null.mod))
  N <- length(obs)
 
  # based on Nagelkerke 1991:
  CoxSnell <- 1 - exp(-(2 / N) * (loglike.M - loglike.0))
  Nagelkerke <- CoxSnell / (1 - exp((2 * N ^ (-1)) * loglike.0))
 
  # based on Allison 2014:
  McFadden <- 1 - (loglike.M / loglike.0)
  Tjur <- mean(pred[obs == 1]) - mean(pred[obs == 0])
  sqPearson <- cor(obs, pred) ^ 2
 
  return(list(CoxSnell = CoxSnell, Nagelkerke = Nagelkerke, McFadden = McFadden, Tjur = Tjur, sqPearson = sqPearson))
}

[presented with Pretty R]

Input data can be either a glm model object or two vectors of observed binary (0 or 1) values and the corresponding predicted probabilities (only binomial-logit GLMs admitted in this case). The output is a named list of the calculated R-squared values. See also the Dsquared function.

 

REFERENCES

Cox, D.R. & Snell E.J. (1989) The Analysis of Binary Data, 2nd ed. Chapman and Hall, London

McFadden, D. (1974) Conditional logit analysis of qualitative choice behavior. In: Zarembka P. (ed.) Frontiers in Economics. Academic Press, New York

Nagelkerke, N.J.D. (1991) A note on a general definition of the coefficient of determination. Biometrika, 78: 691-692

Tjur T. (2009) Coefficients of determination in logistic regression models – a new proposal: the coefficient of discrimination. The American Statistician, 63: 366-372.

Calculate evaluation measures for multiple models

If you have a list of models created e.g. with the multGLM function, or a data frame with presence-absence data and the corresponding predicted values for a set of species, you can use the multModEv function to get a set of evaluation measures for all models simultaneously. Note that all models must have the same sample size. This function is included in the modEvA package (Barbosa et al. 2014).

multModEv.methods <- 'Prevalence', 'AUC', 'CCR', 'Misclass', 'Sensitivity', 'Specificity', 'Omission', 'Commission', 'PPP', 'NPP', 'UPR', 'OPR', 'PPI', 'PAI', 'kappa', 'TSS', 'NMI', 'OddsRatio', 'HL', 'HL.p', 'RMSE', 'Miller.int', 'Miller.slope', 'Miller.p'
 
multModEv <-
function(models = NULL, obs.data = NULL, pred.data = NULL, measures = multModEv.methods, standardize = FALSE, thresh = 0.5, bin.method = "quantiles", quiet = TRUE) {
  # version 2.0 (24 Jun 2015)
 
  for (m in measures) {
    if (!(m %in% modEvAmethods("multModEv"))) stop(m, " is not a valid measure; type modEvAmethods('multModEv') for available options.")
  }
 
  if (!bin.method %in% modEvAmethods("getBins")) stop("Invalid bin.method; type modEvAmethods('getBins') for available options.")
 
  if (is.null(models)) {
    if (is.null(obs.data) | is.null(pred.data)) stop("You must provide either a list of model object(s) of class 'glm', or a set of obs.data + pred.data with matching dimensions.")
  }  # end if !models
 
  else {
    if (!is.null(obs.data)) message("Argument 'obs.data' ignored in favour of 'models'.")
    if (!is.null(pred.data)) message("Argument 'pred.data' ignored in favour of 'models'.")
 
    n <- sapply(models, function(x) length(resid(x)))
    if(!(all(n == n[1]))) stop("Models must all be based on the same sample size.")
 
    obs.data <- pred.data <- fav.data <- matrix(data = NA, nrow = length(resid(models[[1]])), ncol = length(models), dimnames = list(NULL, names(models)))
    for (m in 1 : length(models)) {
      obs.data[ , m] <- models[[m]]$y
      pred.data[ , m] <- models[[m]]$fitted.values
    }  # end for m
  }  # end if models
 
  n.models <- ncol(obs.data)
  n.measures <- length(measures)
  results <- matrix(NA, nrow = n.models, ncol = n.measures)
  rownames(results) <- colnames(obs.data)
  colnames(results) <- measures
 
  message(n.models, " models to evaluate; ", n.measures,  " measures to calculate for each.")
 
  if (any(measures %in% modEvAmethods("threshMeasures"))) {
    thresh.measures <- measures[measures %in% modEvAmethods("threshMeasures")]
    cat("- using '", thresh, "' as the threshold value to calculate ", paste(thresh.measures, collapse = ", "), "\n", sep = "")  # doesn't quite work with 'message' or 'paste' or with a 'sep'
  }  # this needs to be oustide the 'for' loop
 
  bin.measures <- c("HL", "HL.p", "RMSE")  # ABC, unityRsq
  if (any(measures %in% bin.measures)) {
    bin.measures <- measures[measures %in% bin.measures]
    cat("- using '", bin.method, "' as the bin method to calculate ", paste(bin.measures, collapse = ", "), "\n", sep = "")  # doesn't quite work with 'message' or 'paste' or with a 'sep'
  }  # this needs to be oustide the 'for' loop
 
for (m in 1:n.models) {
 
    message("Evaluating model ", m, "...")
 
    if ("Prevalence" %in% measures)
      results[m, "Prevalence"] <- prevalence(obs.data[ , m])
 
    if ("Evenness" %in% measures)
      results[m, "Evenness"] <- evenness(obs.data[ , m])
 
    if ("AUC" %in% measures)
      results[m, "AUC"] <- AUC(obs = obs.data[ , m], pred = pred.data[ , m], simplif = TRUE)
 
    if (any(measures %in% modEvAmethods("threshMeasures"))) {
      for (m in 1:n.models)  for (tm in thresh.measures) {
        results[m, tm] <- threshMeasures(obs = obs.data[ , m], pred = pred.data[ , m], measures = tm, thresh = thresh, standardize = standardize, simplif = TRUE)
      }; rm(m, tm)
    }  # end if measures in modEvAmethods("threshMeasures")
    # thresh.measures <- optiThresh(obs = obs.data[ , m], pred = pred.data[ , m], plot = FALSE, optimize = "each")
 
    if (any(measures %in% c("HL", "HL.p"))) {
      for (m in 1:n.models) {
        HL <- HLfit(obs = obs.data[ , m], pred = pred.data[ , m], bin.method = bin.method, simplif = TRUE)
        if ("HL" %in% measures)  results[m, "HL"] <- HL$chi.sq
        if ("HL.p" %in% measures)  results[m, "HL.p"] <- HL$p.value
        if ("RMSE" %in% measures)  results[m, "RMSE"] <- HL$RMSE
      }; rm(m)
    }  # end if HL
 
    if (any(measures %in% c("Miller.int", "Miller.slope", "Miller.p"))) {
      for (m in 1:n.models) {
        Miller <- MillerCalib(obs = obs.data[ , m], pred = pred.data[ , m], plot = FALSE)
        if ("Miller.int" %in% measures)
          results[m, "Miller.int"] <- Miller$intercept
        if ("Miller.slope" %in% measures)
          results[m, "Miller.slope"] <- Miller$slope
        if ("Miller.p" %in% measures)
          results[m, "Miller.p"] <- Miller$slope.pvalue
      }; rm(m)
    } # end if Miller
  }  # end for m
 
  if (standardize) {
    colnames(results)[colnames(results) == "kappa"] <- "skappa"
    colnames(results)[colnames(results) == "TSS"] <- "sTSS"
  }  # end if standardize
 
  results <- data.frame(Model = rownames(results), results, row.names = NULL)
  message("Finished!")
  return(results)
}

[presented with Pretty R] Usage examples:

multModEv(obs.data = mydata[ , 2:8], pred.data = mydata[ , 9:15], thresh = “preval”) multModEv(models = mymodels$models) multModEv(models = mymodels$models, thresh = 0.5, measures = c(“AUC”, “CCR”, “Sensitivity”, “TSS”))

REFERENCES: Barbosa A.M., Brown J.A. & Real R. (2014) modEvA – an R package for model evaluation and analysis. R package, version 0.1.

Calculate the area under the ROC curve of a model

Like the model evaluation measures calculated by the threshMeasures function, the area under the ROC curve (AUC) assesses the discrimination performance of a model; but unlike them, it does not require the choice of a particular threshold above which to consider that a model predicts species presence, but rather averages discrimination performance over all possible thresholds at a given interval. Mind that there has been criticism to the AUC (Lobo et al. 2008, Jiménez-Valverde et al. 2013), although it is still one of the most widely used metrics in model evaluation. It is highly correlated with species prevalence, so this value is also provided by the AUC function (if simplif = FALSE, the default) for reference.

Although there are functions to calculate the AUC in other R packages (such as ROCR, PresenceAbsence and verification), the AUC function below is more compatible with the remaining functions in this blog and in the modEvA package where it is included (Barbosa et al. 2014), and can be applied not only to a set of presence-absence versus predicted values, but also directly to a model object of class glm.

AUC <- function(obs = NULL, pred = NULL, model = NULL, interval = 0.01, 
       simplif = FALSE, plot = TRUE, plot.values = TRUE, plot.preds = FALSE, 
       grid = FALSE, xlab = c("False positive rate", "(1-specificity)"), 
       ylab = c("True positive rate", "(sensitivity)"), main = "ROC curve", ...) {
  # version 1.4 (16 Oct 2014)
 
  if (!is.null(model)) {
    if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.")
    if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.")
    obs <- model$y
    pred <- model$fitted.values
  }  # end if model
 
  stopifnot(
    length(obs) == length(pred),
    obs %in% c(0,1),
    pred >= 0,
    pred <= 1,
    interval > 0,
    interval < 1
  )
 
  n1 <- sum(obs == 1)
  n0 <- sum(obs == 0) 
 
  # next 3 lines from Wintle et al 2005 supp mat "roc" function
  xy <- c(pred[obs == 0], pred[obs == 1])
  rnk <- rank(xy)
  AUC <- ((n0 * n1) + ((n0 * (n0 + 1))/2) - sum(rnk[1 : n0])) / (n0 * n1)
 
  if (simplif)  return(AUC)
 
  N <- length(obs)
  preval <- prevalence(obs)
  thresholds <- seq(0, 1, by = interval)
  Nthresh <- length(thresholds)
  true.positives <- true.negatives <- sensitivity <- specificity <- false.pos.rate <- n.preds <- prop.preds <- numeric(Nthresh)
 
  for (t in 1 : Nthresh) {
    true.positives[t] <- sum(obs == 1 & pred >= thresholds[t])
    true.negatives[t] <- sum(obs == 0 & pred < thresholds[t])
    sensitivity[t] <- true.positives[t] / n1
    specificity[t] <- true.negatives[t] / n0
    false.pos.rate[t] <- 1 - specificity[t]
    n.preds[t] <- sum(round(pred, nchar(Nthresh) - 1) == thresholds[t])  # inspired by Peterson Papes & Soberon 2008 rethinking ROC analysis
    prop.preds[t] <- n.preds[t] / length(pred)
  }; rm(t)
 
  if (plot) {
    plot(x = c(0, 1), y = c(0, 1), type = "l", xlab = xlab, ylab = ylab, main = main, col = "grey", ...)  # plots the 0.5 diagonal
 
    if (grid) abline(h = thresholds, v = thresholds, col = "lightgrey")
 
    lines(x = false.pos.rate, y = sensitivity, lwd = 2)  # ROC curve
 
    if (plot.values) {
      text(1.0, 0.1, adj = 1, substitute(paste(AUC == a), list(a = round(AUC, 3))))
    }  # end if plot.values
 
    if (plot.preds) {
      #points(x = false.pos.rate, y = sensitivity, cex = rev(sqrt(100 * prop.preds)), col = "red")  # but why do I have to rev?
      #points(x = thresholds, y = specificity, cex = sqrt(100 * prop.preds), col = "blue")
      #points(x = false.pos.rate, y = sensitivity, cex = sqrt(100 * prop.preds), col = "red")  # ROC curve
      points(x = thresholds, y = rep(0, Nthresh), cex = sqrt(100 * prop.preds), col = "red")
    }  # end if plot.preds
  }  # end if plot
 
  return (list(thresholds = data.frame(thresholds, true.positives, true.negatives, sensitivity, specificity, false.pos.rate, n.preds, prop.preds), N = N, prevalence = preval, AUC = AUC, AUCratio = AUC / 0.5))
}

[presented with Pretty R]

Usage examples:

AUC(obs = mydata$myspecies, pred = mydata$myspecies_P, simplif = TRUE)

AUC(obs = mydata$myspecies, pred = mydata$myspecies_P)

AUC(model = mymodel, main = “my species’ ROC plot”)

AUC(model = mymodel, plot.values = FALSE)

If you want the AUC values for a list of models produced e.g. by the multGLM function, you can do:

mymodels.auc <- vector(“numeric”, length(mymodels$models))

names(mymodels.auc) <- names(mymodels$models)

for (i in 1:length(mymodels$models)) {

mymodels.auc[i] <- AUC(model = mymodels$models[[i]], simplif = TRUE)

}

mymodels.auc

References:

Barbosa A.M., Brown J.A. & Real R. (2014) modEvA – an R package for model evaluation and analysis. R package, version 0.1.

Lobo, J.M., Jiménez-Valverde, A. & Real, R. (2008). AUC: a misleading measure of the performance of predictive distribution models. Global Ecology and Biogeography 17: 145-151

Jiménez-Valverde, A., Acevedo, P., Barbosa, A.M., Lobo, J.M. & Real, R. (2013). Discrimination capacity in species distribution models depends on the representativeness of the environmental domain. Global Ecology and Biogeography 22: 508–516

Calculate the amount of deviance explained by a GLM

Linear models come with an R-squared value that measures the proportion of variation that the model accounts for. The R-squared is provided with summary(model) in R. For generalized linear models (GLMs), the equivalent is the amount of deviance accounted for (D-squared; Guisan & Zimmermann 2000), but this value is not normally provided with the model summary. The Dsquared function, now included in the modEvA package (Barbosa et al. 2014), calculates it. There is also an option to calculate the adjusted D-squared, which takes into account the number of observations and the number of model parameters, thus allowing direct comparison among different models (Weisberg 1980, Guisan & Zimmermann 2000). UPDATE: see also further measures in the new RsqGLM function.

Dsquared <- function(model = NULL, 
                     obs = NULL, 
                     pred = NULL, 
                     family = NULL, # needed only when 'model' not provided
                     adjust = FALSE, 
                     npar = NULL) { # needed only when 'model' not provided
  # version 1.4 (31 Aug 2015)
 
  model.provided <- ifelse(is.null(model), FALSE, TRUE)
 
  if (model.provided) {
    if (!("glm" %in% class(model))) stop ("'model' must be of class 'glm'.")
    if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.")
    if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.")
    obs <- model$y
    pred <- model$fitted.values
 
  } else { # if model not provided
    if (is.null(obs) | is.null(pred)) stop ("You must provide either 'obs' and 'pred', or a 'model' object of class 'glm'.")
    if (length(obs) != length(pred)) stop ("'obs' and 'pred' must be of the same length (and in the same order).")
    if (is.null(family)) stop ("With 'obs' and 'pred' arguments (rather than a model object), you must also specify one of two model family options: 'binomial' or 'poisson' (in quotes).")
    else if (!is.character(family)) stop ("Argument 'family' must be provided as character (i.e. in quotes: 'binomial' or 'poisson').")
    else if (length(family) != 1 | !(family %in% c("binomial", "poisson"))) stop ("'family' must be either 'binomial' or 'poisson' (in quotes).")
 
    if (family == "binomial") {
      if (any(!(obs %in% c(0, 1)) | pred < 0 | pred > 1)) stop ("'binomial' family implies that 'obs' data should be binary (with values 0 or 1) and 'pred' data should be bounded between 0 and 1.")
      link <- log(pred / (1 - pred))  # logit
    }  # end if binomial
 
    else if (family == "poisson") {
      if (any(obs %%1 != 0)) stop ("'poisson' family implies that 'obs' data should consist of whole numbers.")
      link <- log(pred)
    }  # end if poisson
 
    model <- glm(obs ~ link, family = family)
  }  # end if model not provided
 
  D2 <- (model$null.deviance - model$deviance) / model$null.deviance
 
  if (adjust) {
    if (model.provided) {
      n <- length(model$y)
      #p <- length(model$coefficients)
      p <- attributes(logLik(model))$df
    } else {
      if (is.null(npar)) stop ("Adjusted D-squared from 'obs' and 'pred' values (rather than a model object) requires specifying the number of parameters in the underlying model ('npar').")
      n <- length(na.omit(obs))
      p <- npar
    }  # end if model.provided else
 
    D2 <- 1 - ((n - 1) / (n - p)) * (1 - D2)
  }  # end if adjust
 
  return (D2)
}

[presented with Pretty R]

Usage examples:

Dsquared(model = mymodel)

Dsquared(model = mymodel, adjust = TRUE)

Dsquared(obs = myobs, pred= mypred, family = “poisson”)

Dsquared(obs = myobs, pred = mypred, family = “poisson”, adjust = TRUE, npar = 5)

References

Barbosa A.M., Brown J.A. & Real R. (2014) modEvA – an R package for model evaluation and analysis. R package, version 0.1.

Guisan, A. & Zimmermann, N.E. (2000). Predictive habitat distribution models in ecology. Ecological Modelling 135: 147-186

Weisberg, S. (1980). Applied Linear Regression. Wiley, New York

Model calibration with the Hosmer & Lemeshow goodness-of-fit statistic

The model evaluation functions posted here so far calculate only measures of discrimination capacity, i.e., how well the model is capable of distinguishing presences from absences (after the model’s continuous predictions of presence probability or alike are converted to binary predictions of presence or absence). However, there is another important facet of model evaluation: calibration or reliability, i.e., the relationship between predicted probability and observed prevalence (Pearce & Ferrier 2000; Jiménez-Valverde et al. 2013). The HLfit function, now included in the modEvA package (Barbosa et al. 2014), measures model reliability with the Hosmer & Lemeshow goodness-of-fit statistic (Hosmer & Lemeshow 1980). It also calculates the root mean square error (RMSE).

Note that these statistics have some limitations (see e.g. this post at Statistical Horizons), mainly due to the need to group the values into bins within which to compare probability and prevalence, and the influence of the binning method on the result. The HLfit function can use several binning methods, which are implemented in the getBins function provided with it (and also included in modEvA). You should therefore evaluate your model with different binning methods, to see if the results change significantly. The HLfit  function uses also the prevalence function.

bin.methods c("round.prob", "prob.bins", "size.bins", "n.bins", "quantiles")
 
getBins function(obs = NULL, pred = NULL, model = NULL, id = NULL, bin.method = "quantiles", n.bins = 10, fixed.bin.size = FALSE, min.bin.size = 15, min.prob.interval = 0.1, simplif = FALSE) {
 
  # version 3.4 (11 Sep 2014)
  # obs: a vector of observed presences (1) and absences (0) or another binary response variable
  # pred: a vector with the corresponding predicted values of presence probability, habitat suitability, environmental favourability or alike
  # model: instead of (and overriding) obs and pred, you can provide a model object of class "glm"
  # id: optional vector of row identifiers; must be of the same length and in the same order of 'obs' and 'pred' (or of the cases used to build 'model')
 
  if (!is.null(model)) {
    if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.")
    if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.")
    obs }  # end if model
 
  stopifnot(
    length(obs) == length(pred),
    !(NA %in% obs),
    !(NA %in% pred),
    obs %in% c(0, 1),
    pred >= 0,
    pred <= 1,
    is.null(id) | length(id) == length(pred),
    n.bins >= 2,
    min.bin.size >= 0,
    min.prob.interval > 0,
    min.prob.interval < 1
  )
 
  if(!(bin.method %in% modEvAmethods("getBins"))) stop("Invalid bin.method; type modEvAmethods('getBins') for available options.")
 
  N length(obs)
 
  # now get prob.bin (probability bin to which each point belongs):
  if (bin.method == "round.prob") {  # flaw: assymmetric (up to 11 bins, the first and last with half the prob range)
    message("Arguments n.bins and min.bin.size are ignored by this bin.method")
    prob.bin round(pred, digits = nchar(min.prob.interval) - 2)
  }  # end method round.prob
 
  if (bin.method == "prob.bins") {  # flaw: may generate empty or very small bins
    message("Arguments n.bins and min.bin.size are ignored by this bin.method")
    bin.cuts seq(from = 0, to = 1, by = min.prob.interval)
    prob.bin findInterval(pred, bin.cuts)
  }  # end method prob.bins
 
  if (bin.method == "size.bins") {
    message("Arguments n.bins and min.prob.interval are ignored by this bin.method")
    bin.method "n.bins"
    n.bins floor(N / min.bin.size)
    fixed.bin.size TRUE
  }  # end method size.bins
 
  if (bin.method == "n.bins") {   # note: bins do not have constant size (and some get less than min size)
    message("Argument min.prob.interval is ignored by this bin.method")
    if (fixed.bin.size) {
      prob.bin findInterval(pred, quantile(pred, probs = seq(from = 0, to = 1, by = 1 / n.bins)))
    } else {
      prob.bin cut(pred, n.bins)
    }
  }  # end method n.bins
 
  if (bin.method == "quantiles") {
    # quantile binning based on 'hosmerlem' function by Peter D. M. Macdonald (http://www.stat.sc.edu/~hitchcock/diseaseoutbreakRexample704.txt)
    cutpoints quantile(pred, probs = seq(0, 1, 1/n.bins))
    prob.bin findInterval(pred, cutpoints)
  }
 
  prob.bins sort(unique(prob.bin))  # lists the probability bins in the model
 
  bins.table t(as.data.frame(unclass(table(obs,prob.bin))))
  bins.table data.frame(rowSums(bins.table), bins.table[ , c(2, 1)])
  colnames(bins.table) c("N","N.pres","N.abs")
  bins.table$prevalence with(bins.table, N.pres / N)
  bins.table$mean.prob tapply(pred, prob.bin, mean)
  bins.table$median.prob tapply(pred, prob.bin, median)
  bins.table$difference with(bins.table, mean.prob - prevalence)
  bins.table$max.poss.diff ifelse(bins.table$mean.prob < 0.5, 1 - bins.table$mean.prob, bins.table$mean.prob)
  bins.table$adj.diff with(bins.table, abs(difference - max.poss.diff))
  bins.table$overpred apply(bins.table[ ,c("prevalence", "mean.prob")], 1, max)
  bins.table$underpred apply(bins.table[ ,c("prevalence", "mean.prob")], 1, min)
 
  bins.table [bins.table$N > 0, ]  # eliminates empty bins (which impede calculations)
 
  if(min(as.data.frame(bins.table)$N) < 15) warning("There is at least one bin with less than 15 values, for which comparisons may not be meaningful; consider using a bin.method that allows defining a minimum bin size")
  n.bins nrow(bins.table)
 
  return(list(prob.bin = prob.bin, bins.table = bins.table, N = N, n.bins = n.bins))
 
}
HLfit <- function (model = NULL, obs = NULL, pred = NULL, bin.method, n.bins = 10, fixed.bin.size = FALSE, min.bin.size = 15, min.prob.interval = 0.1, simplif = FALSE, alpha = 0.05, plot = TRUE, plot.values = TRUE, plot.bin.size = TRUE, xlab = "Predicted probability", ylab = "Observed prevalence", ...) {
  # version 1.5 (24 Jun 2015)
 
  if (!is.null(model)) {
    if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.")
    if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.")
    obs <- model$y
    pred <- model$fitted.values
  }  # end if model
 
  stopifnot(
    length(obs) == length(pred),
    obs %in% c(0, 1),
    pred >= 0,
    pred <= 1
  )
 
  bins <- getBins(obs = obs, pred = pred, bin.method = bin.method, n.bins = n.bins, fixed.bin.size = fixed.bin.size, min.bin.size = min.bin.size, min.prob.interval = min.prob.interval)
  n.bins <- nrow(bins$bins.table)
 
  # next 4 lines: adapted from hosmerlem function in http://www.stat.sc.edu/~hitchcock/diseaseoutbreakRexample704.txt
  observed <- xtabs(cbind(1 - obs, obs) ~ bins$prob.bin)
  expected <- xtabs(cbind(1 - pred, pred) ~ bins$prob.bin)
  chi.sq <- sum((observed - expected) ^ 2 / expected)
  p.value <- 1 - pchisq(chi.sq, df = n.bins - 2)
  rmse <- sqrt(mean((observed - expected) ^ 2))
 
  if (simplif) return(list(chi.sq = chi.sq, p.value = p.value, RMSE = rmse))
 
  # plotting loosely based on calibration.plot function in package PresenceAbsence
  if (plot) {
    N.total <- tapply(obs, bins$prob.bin, length)  # N cases in each bin
    N.presence <- tapply(obs, bins$prob.bin, sum)  # N presences in each bin
    Empty <- is.na(N.total)
    N.total[is.na(N.total)] <- 0
    N.presence[is.na(N.presence)] <- 0
    OBS.proportion <- N.presence / N.total
    OBS.proportion[Empty] <- NA
    df1.low <- 2 * (N.total - N.presence + 1)
    df2.low <- 2 * N.presence
    df1.up <- 2 * (N.presence + 1)
    df2.up <- 2 * (N.total - N.presence)
    N.bins <- length(unique(bins$prob.bin))  # fui eue
    Lower <- rep(0, N.bins)
    Upper <- rep(1, N.bins)
    TF <- N.presence != 0
    Lower[TF] <- N.presence[TF]/(N.presence[TF] + ((N.total[TF] - N.presence[TF] + 1) * qf(1 - alpha/2, df1.low[TF], df2.low[TF])))
    Lower[Empty] <- NA
    TF <- N.presence < N.total
    Upper[TF] <- ((N.presence[TF] + 1) * qf(1 - alpha/2, df1.up[TF], df2.up[TF]))/(N.total[TF] - N.presence[TF] + ((N.presence[TF] + 1) * qf(1 - alpha/2, df1.up[TF], df2.up[TF])))
    Upper[Empty] <- NA
    plot(c(-0.05, 1.05), c(-0.05, 1.05), type = "n", xlab = xlab, ylab = ylab, ...)
    bin.centers <- bins$bins.table$median.prob  # fui eue
 
    if (plot.bin.size) {
      text(bin.centers, Upper + 0.07, labels = N.total)
    }
 
    abline(a = 0, b = 1, lty = 2)
    for (i in 1:N.bins) {
      lines(rep(bin.centers[i], 2), c(Lower[i], Upper[i]))
    }
    points(bin.centers, OBS.proportion, pch = 20)
 
    if (plot.values) {
      text(1, 0.2, adj = 1, substitute(paste(HL == a), list(a = round(chi.sq, 1))))
      if (p.value < 0.001) {
        text(1, 0.1, adj = 1, substitute(paste(italic(p) < a), list(a = 0.001)))
      } else {
        text(1, 0.1, adj = 1, substitute(paste(italic(p) == a), list(a = round(p.value, 3))))
      }
      text(1, 0.0, adj = 1, substitute(paste(RMSE == a), list(a = round(rmse, 1))))
    }  # end if plot values
  }
 
  BinPred <- tapply(pred, bins$prob.bin, mean)
  bins.table <- data.frame(BinCenter = bin.centers, NBin = N.total, BinObs = OBS.proportion, BinPred = BinPred, BinObsCIlower = Lower, BinObsCIupper = Upper)
 
  return(list(bins.table = bins.table, chi.sq = chi.sq, DF = n.bins - 2, p.value = p.value, RMSE = rmse))
}

[presented with Pretty R]

HLfit still needs some code simplification, but fixes have recently been applied to the getBins function so that it doesn’t fail when probability breakpoints are not unique. Additional model calibration measures are soon to be added to the package.

Usage examples:

HLfit(model = mymodel, bin.method = “prob.bins”, min.prob.interval = 0.1)

HLfit(obs= myobs, pred = mypred, bin.method = “n.bins”, n.bins = 10)

HLfit(obs= myobs, pred = mypred, bin.method = “quantiles”)

References:

Barbosa A.M., Brown J.A. & Real R. (2014) modEvA – an R package for model evaluation and analysis. R package, version 0.1.

Hosmer, D.W. & Lemeshow, S. (1980). A goodness-of-fit test for the multiple logistic regression model. Communications in Statistics, A10: 1043–1069

Jiménez-Valverde, A., Acevedo, P., Barbosa, A.M., Lobo, J.M. & Real, R. (2013). Discrimination capacity in species distribution models depends on the representativeness of the environmental domain. Global Ecology and Biogeography, 22: 508–516

Pearce, J. & Ferrier, S. (2000). Evaluating the Predictive Performance of Habitat Models Developed using Logistic Regression. Ecological Modeling, 133: 225 – 245

Optimization of model thresholds based on a pair of measures

The optiPair function, now included in the modEvA package (Barbosa et al. 2014), follows the optiThresh, threshMeasures and evaluate functions, which should all be loaded along; and extends (thus replacing) the SensSpecCurve function (now deprecated), in the sense that optiPair can be used with any pair of model evaluation measures that balance each other, such as sensitivity-specificity, omission-commission, or underprediction-overprediction (featured in Barbosa et al. 2013). The function plots both measures in the given pair against all thresholds, and calculates the optimal sum, difference and mean of the two measures.

SensSpec

optiPair <-
function (obs = NULL, pred = NULL, model = NULL, 
          measures = c("Sensitivity", "Specificity"), 
          interval = 0.01, plot = TRUE, plot.sum = FALSE, 
          plot.diff = FALSE, ylim = NULL, ...) {
  # version 1.5 (3 Apr 2014)
  # obs: a vector of observed presences (1) and absences (0) or another binary response variable
  # pred: a vector with the corresponding predicted values of presence probability, habitat suitability, environmental favourability or alike
  # model: instead of (and overriding) obs and pred, a model object of class "glm"
  # measures: the pair of measures whose curves to plot, e.g. c("Sensitivity", "Specificity")
  # interval: the interval of thresholds at which to calculate sensitivity and specificity
  # plot: logical indicating whether or not to plot the measures
  # plot.sum and plot.diff will optionally plot the sum (+) and/or the difference (-) between both measures in the pair
  # ... : additional arguments to be passed to the "plot" function
 
  if(length(measures) != 2) stop ("'measures' must contain two elements.")
 
  if(is.null(model)) {
    if (is.null(obs) | is.null(pred)) stop("You must provide either the 'obs' 
and 'pred' vectors, or a 'model' object of class 'glm'")
  }  # end if null model
  else {
    if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.")
    if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.")
    obs <- model$y
    pred <- model$fitted.values
    model <- NULL  # so the message is not repeated for each threshold
  }  # end if model
 
  if (NA %in% obs | NA %in% pred) stop("Please remove (rows with) NA 
                                       from your data")
  if (length(obs) != length(pred)) stop("'obs' and 'pred' must have the same 
                                        number of values (and in the same order)")
  if (!all(obs %in% c(0, 1))) stop ("'obs' must consist of binary observations 
                                    of 0 or 1")
  if (any(pred < 0 | pred > 1)) stop ("'pred' must range between 0 and 1")
 
  measures.values <- optiThresh(obs = obs, pred = pred, interval = interval, measures = measures, optimize = "each", simplif = TRUE)
  measures.values$Difference <- abs(measures.values[ , 1] - measures.values[ , 2])
  measures.values$Sum <- rowSums(measures.values[ , 1:2])
  measures.values$Mean <- rowMeans(measures.values[ , 1:2])
  measures.values$Threshold <- as.numeric(rownames(measures.values))
 
  MinDiff <- min(measures.values$Difference)
  MaxSum <- max(measures.values$Sum)
  MaxMean <- max(measures.values$Mean)
 
  ThreshDiff <- with(measures.values, Threshold[which.min(Difference)])
  ThreshSum <- with(measures.values, Threshold[which.max(Sum)])
  ThreshMean <- with(measures.values, Threshold[which.max(Mean)])
 
  if (plot) {
 
    finite <- as.matrix(measures.values[ , 1:2])
    finite <- finite[is.finite(finite)]
    if(is.null(ylim)) {
      if(plot.sum) ylim <- c(min(finite), MaxSum)
      else ylim <- c(min(finite), max(finite))
    }  # end if null ylim
 
    plot(measures.values[ , 1] ~ measures.values$Threshold, pch = 20, xlab = "Threshold", ylab = measures, ylim = ylim, ...)
    # title(ylab = measures, col.lab = c("black", "grey"))  # error: "col.lab" has the wrong length
    mtext(measures[1], side = 2, line = 3)
    points(measures.values[ , 2] ~ measures.values$Threshold, pch = 20, col = "darkgrey")
    mtext(measures[2], side = 2, line = 2, col = "darkgrey")
 
    abline(h = measures.values[which.min(measures.values$Difference), 1], col = "lightgrey", lty = 2)
    abline(v = ThreshDiff, col = "lightgrey", lty = 2)
 
    if (plot.sum) {
      with(measures.values, points(Sum ~ Threshold, pch = '+'))
      abline(v = ThreshSum, col = "lightgrey", lty = 2)
      abline(h = MaxSum, col = "lightgrey", lty = 2)
    }    # end if plot sum
 
    if (plot.diff) {
      with(measures.values, points(Difference ~ Threshold, pch = '-', col = "grey"))
      abline(v = ThreshDiff, col = "grey", lty = 2)
      abline(h = MinDiff, col = "grey", lty = 2)
    }  # end if plot diff
  }  # end if plot
 
  return(list(measures.values = measures.values, MinDiff = MinDiff, ThreshDiff = ThreshDiff, MaxSum = MaxSum, ThreshSum = ThreshSum, MaxMean = MaxMean, ThreshMean = ThreshMean))
 
}

[Created by Pretty R at inside-R.org]

To use it, just load the required functions and replace the names of your data frame, presence-absence data and predicted values (or else use a “glm” model object directly) in either of the folowing commands:

optiPair(obs = mydata$myspecies, pred = mydata$myspecies_P, measures = c(“Sensitivity”, “Specificity”))

optiPair(model = mymodel, measures = c(“UPR”, “OPR”))

References

Barbosa A.M., Brown J.A. & Real R. (2014) modEvA – an R package for model evaluation and analysis. R package, version 0.1.

Barbosa, A.M., Real, R., Muñoz, A.-R. & Brown, J.A. (2013) New measures for assessing model equilibrium and prediction mismatch in species distribution models. Diversity and Distributions, online early (DOI: 10.1111/ddi.12100).