# Miller calibration statistics

The MillerCalib function, now included in the modEvA package (version 0.7.2 just released), calculates Miller’s (1991) calibration statistics for a logistic regression model, namely the intercept and slope of the regression of the binary response variable on the logit of predicted probabilities. Optionally and by default, it also plots the corresponding regression line (black) over the reference diagonal (grey dashed line).

Calibration (or reliability) measures how a model’s predicted probabilities relate to observed species prevalence or proportion of presences in the modelled data. If predictions are perfectly calibrated, the slope will equal 1 and the intercept will equal 0, so the model’s calibation line will perfectly overlap with the reference diagonal. Note that Miller’s statistics assess the model globally: a model is well calibrated if the average of all predicted probabilities equals the proportion of presences in the modelled data. Therefore, good calibration is always attained on the same data used for building the model (Miller 1991). The function is more useful to see if a model stays well calibrated when extrapolated to a new dataset.

```MillerCalib <- function(model = NULL, obs = NULL, pred = NULL, plot = TRUE, plot.values = TRUE, digits = 4, xlab = "", ylab = "", main = "Miller calibration", ...) {
# version 1.3 (24 Jun 2015)

model.provided <- ifelse(is.null(model), FALSE, TRUE)

if (model.provided) {
if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.")
if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.")
obs <- model\$y
pred <- model\$fitted.values
} else { # if model not provided
if (is.null(obs) | is.null(pred)) stop("You must provide either 'obs' and 'pred', or a 'model' object of class 'glm'")
}

stopifnot(
length(obs) == length(pred),
obs %in% c(0,1),
pred >= 0,
pred <= 1
)

logit <- log(pred / (1 - pred))
mod <- glm(obs ~ logit, family = binomial)
intercept <- mod\$coef[[1]]
slope <- mod\$coef[[2]]
std.err <- summary(mod)\$coefficients["logit", "Std. Error"]
slope.p <- abs((slope - 1) / sqrt(std.err^2 + 0))  # Paternoster 98; http://stats.stackexchange.com/questions/55501/test-a-significant-difference-between-two-slope-values

if (plot) {
ymin <- min(0, intercept)
ymax <- max(1, intercept + 0.1)
plot(c(0, 1), c(ymin, ymax), type = "n", xlab = xlab, ylab = ylab, main = main)
abline(0, 1, col = "lightgrey", lty = 2)
abline(intercept, slope)
if (plot.values) {
plotext <- paste("intercept =" , round(intercept, digits), "\nslope =", round(slope, digits), "\nslope p-value =", round(slope.p, digits))
text(x = 0, y = ymax - 0.25, adj = 0, labels = plotext)
}
}
return(list(intercept = intercept, slope = slope, slope.pvalue = slope.p))
}  # end MillerCalib function```

[presented with Pretty R]

Input data can be either a glm model object with binomial family and logit link, or two vectors of observed binary (0 or 1) values and the corresponding predicted probabilities. The output is a named list with the calibration intercept and slope.

REFERENCES

Miller M.E., Hui S.L. & Tierney W.M. (1991) Validation techniques for logistic regression models. Statistics in Medicine, 10: 1213-1226

Wintle B.A., Elith J. & Potts J.M. (2005) Fauna habitat modelling and mapping: A review and case study in the Lower Hunter Central Coast region of NSW. Austral Ecology, 30: 719-738