Miller calibration statistics

The MillerCalib function, now included in the modEvA package (version 0.7.2 just released), calculates Miller’s (1991) calibration statistics for a logistic regression model, namely the intercept and slope of the regression of the binary response variable on the logit of predicted probabilities. Optionally and by default, it also plots the corresponding regression line (black) over the reference diagonal (grey dashed line).

Calibration (or reliability) measures how a model’s predicted probabilities relate to observed species prevalence or proportion of presences in the modelled data. If predictions are perfectly calibrated, the slope will equal 1 and the intercept will equal 0, so the model’s calibation line will perfectly overlap with the reference diagonal. Note that Miller’s statistics assess the model globally: a model is well calibrated if the average of all predicted probabilities equals the proportion of presences in the modelled data. Therefore, good calibration is always attained on the same data used for building the model (Miller 1991). The function is more useful to see if a model stays well calibrated when extrapolated to a new dataset.

MillerCalib <- function(model = NULL, obs = NULL, pred = NULL, plot = TRUE, plot.values = TRUE, digits = 4, xlab = "", ylab = "", main = "Miller calibration", ...) {
  # version 1.3 (24 Jun 2015)
 
  model.provided <- ifelse(is.null(model), FALSE, TRUE)
 
  if (model.provided) {
    if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.")
    if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.")
    obs <- model$y
    pred <- model$fitted.values
  } else { # if model not provided
    if (is.null(obs) | is.null(pred)) stop("You must provide either 'obs' and 'pred', or a 'model' object of class 'glm'")
  }
 
  stopifnot(
    length(obs) == length(pred),
    obs %in% c(0,1),
    pred >= 0,
    pred <= 1
  )
 
  logit <- log(pred / (1 - pred))
  mod <- glm(obs ~ logit, family = binomial)
  intercept <- mod$coef[[1]]
  slope <- mod$coef[[2]]
  std.err <- summary(mod)$coefficients["logit", "Std. Error"]
  slope.p <- abs((slope - 1) / sqrt(std.err^2 + 0))  # Paternoster 98; http://stats.stackexchange.com/questions/55501/test-a-significant-difference-between-two-slope-values
 
  if (plot) {
    ymin <- min(0, intercept)
    ymax <- max(1, intercept + 0.1)
    plot(c(0, 1), c(ymin, ymax), type = "n", xlab = xlab, ylab = ylab, main = main)
    abline(0, 1, col = "lightgrey", lty = 2)
    abline(intercept, slope)
    if (plot.values) {
      plotext <- paste("intercept =" , round(intercept, digits), "\nslope =", round(slope, digits), "\nslope p-value =", round(slope.p, digits))
      text(x = 0, y = ymax - 0.25, adj = 0, labels = plotext)
    }
  }
  return(list(intercept = intercept, slope = slope, slope.pvalue = slope.p))
}  # end MillerCalib function

[presented with Pretty R]

Input data can be either a glm model object with binomial family and logit link, or two vectors of observed binary (0 or 1) values and the corresponding predicted probabilities. The output is a named list with the calibration intercept and slope.

REFERENCES

Miller M.E., Hui S.L. & Tierney W.M. (1991) Validation techniques for logistic regression models. Statistics in Medicine, 10: 1213-1226

Wintle B.A., Elith J. & Potts J.M. (2005) Fauna habitat modelling and mapping: A review and case study in the Lower Hunter Central Coast region of NSW. Austral Ecology, 30: 719-738

Advertisements

Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s