# Model calibration with the Hosmer & Lemeshow goodness-of-fit statistic

The model evaluation functions posted here so far calculate only measures of discrimination capacity, i.e., how well the model is capable of distinguishing presences from absences (after the model’s continuous predictions of presence probability or alike are converted to binary predictions of presence or absence). However, there is another important facet of model evaluation: calibration or reliability, i.e., the relationship between predicted probability and observed prevalence (Pearce & Ferrier 2000; Jiménez-Valverde et al. 2013). The HLfit function, now included in the modEvA package (Barbosa et al. 2014), measures model reliability with the Hosmer & Lemeshow goodness-of-fit statistic (Hosmer & Lemeshow 1980). It also calculates the root mean square error (RMSE).

Note that these statistics have some limitations (see e.g. this post at Statistical Horizons), mainly due to the need to group the values into bins within which to compare probability and prevalence, and the influence of the binning method on the result. The HLfit function can use several binning methods, which are implemented in the getBins function provided with it (and also included in modEvA). You should therefore evaluate your model with different binning methods, to see if the results change significantly. The HLfit  function uses also the prevalence function.

```bin.methods c("round.prob", "prob.bins", "size.bins", "n.bins", "quantiles")

getBins function(obs = NULL, pred = NULL, model = NULL, id = NULL, bin.method = "quantiles", n.bins = 10, fixed.bin.size = FALSE, min.bin.size = 15, min.prob.interval = 0.1, simplif = FALSE) {

# version 3.4 (11 Sep 2014)
# obs: a vector of observed presences (1) and absences (0) or another binary response variable
# pred: a vector with the corresponding predicted values of presence probability, habitat suitability, environmental favourability or alike
# model: instead of (and overriding) obs and pred, you can provide a model object of class "glm"
# id: optional vector of row identifiers; must be of the same length and in the same order of 'obs' and 'pred' (or of the cases used to build 'model')

if (!is.null(model)) {
if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.")
if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.")
obs }  # end if model

stopifnot(
length(obs) == length(pred),
!(NA %in% obs),
!(NA %in% pred),
obs %in% c(0, 1),
pred >= 0,
pred <= 1,
is.null(id) | length(id) == length(pred),
n.bins >= 2,
min.bin.size >= 0,
min.prob.interval > 0,
min.prob.interval < 1
)

if(!(bin.method %in% modEvAmethods("getBins"))) stop("Invalid bin.method; type modEvAmethods('getBins') for available options.")

N length(obs)

# now get prob.bin (probability bin to which each point belongs):
if (bin.method == "round.prob") {  # flaw: assymmetric (up to 11 bins, the first and last with half the prob range)
message("Arguments n.bins and min.bin.size are ignored by this bin.method")
prob.bin round(pred, digits = nchar(min.prob.interval) - 2)
}  # end method round.prob

if (bin.method == "prob.bins") {  # flaw: may generate empty or very small bins
message("Arguments n.bins and min.bin.size are ignored by this bin.method")
bin.cuts seq(from = 0, to = 1, by = min.prob.interval)
prob.bin findInterval(pred, bin.cuts)
}  # end method prob.bins

if (bin.method == "size.bins") {
message("Arguments n.bins and min.prob.interval are ignored by this bin.method")
bin.method "n.bins"
n.bins floor(N / min.bin.size)
fixed.bin.size TRUE
}  # end method size.bins

if (bin.method == "n.bins") {   # note: bins do not have constant size (and some get less than min size)
message("Argument min.prob.interval is ignored by this bin.method")
if (fixed.bin.size) {
prob.bin findInterval(pred, quantile(pred, probs = seq(from = 0, to = 1, by = 1 / n.bins)))
} else {
prob.bin cut(pred, n.bins)
}
}  # end method n.bins

if (bin.method == "quantiles") {
# quantile binning based on 'hosmerlem' function by Peter D. M. Macdonald (http://www.stat.sc.edu/~hitchcock/diseaseoutbreakRexample704.txt)
cutpoints quantile(pred, probs = seq(0, 1, 1/n.bins))
prob.bin findInterval(pred, cutpoints)
}

prob.bins sort(unique(prob.bin))  # lists the probability bins in the model

bins.table t(as.data.frame(unclass(table(obs,prob.bin))))
bins.table data.frame(rowSums(bins.table), bins.table[ , c(2, 1)])
colnames(bins.table) c("N","N.pres","N.abs")
bins.table\$prevalence with(bins.table, N.pres / N)
bins.table\$mean.prob tapply(pred, prob.bin, mean)
bins.table\$median.prob tapply(pred, prob.bin, median)
bins.table\$difference with(bins.table, mean.prob - prevalence)
bins.table\$max.poss.diff ifelse(bins.table\$mean.prob < 0.5, 1 - bins.table\$mean.prob, bins.table\$mean.prob)
bins.table\$overpred apply(bins.table[ ,c("prevalence", "mean.prob")], 1, max)
bins.table\$underpred apply(bins.table[ ,c("prevalence", "mean.prob")], 1, min)

bins.table [bins.table\$N > 0, ]  # eliminates empty bins (which impede calculations)

if(min(as.data.frame(bins.table)\$N) < 15) warning("There is at least one bin with less than 15 values, for which comparisons may not be meaningful; consider using a bin.method that allows defining a minimum bin size")
n.bins nrow(bins.table)

return(list(prob.bin = prob.bin, bins.table = bins.table, N = N, n.bins = n.bins))

}```
```HLfit <- function (model = NULL, obs = NULL, pred = NULL, bin.method, n.bins = 10, fixed.bin.size = FALSE, min.bin.size = 15, min.prob.interval = 0.1, simplif = FALSE, alpha = 0.05, plot = TRUE, plot.values = TRUE, plot.bin.size = TRUE, xlab = "Predicted probability", ylab = "Observed prevalence", ...) {
# version 1.5 (24 Jun 2015)

if (!is.null(model)) {
if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.")
if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.")
obs <- model\$y
pred <- model\$fitted.values
}  # end if model

stopifnot(
length(obs) == length(pred),
obs %in% c(0, 1),
pred >= 0,
pred <= 1
)

bins <- getBins(obs = obs, pred = pred, bin.method = bin.method, n.bins = n.bins, fixed.bin.size = fixed.bin.size, min.bin.size = min.bin.size, min.prob.interval = min.prob.interval)
n.bins <- nrow(bins\$bins.table)

# next 4 lines: adapted from hosmerlem function in http://www.stat.sc.edu/~hitchcock/diseaseoutbreakRexample704.txt
observed <- xtabs(cbind(1 - obs, obs) ~ bins\$prob.bin)
expected <- xtabs(cbind(1 - pred, pred) ~ bins\$prob.bin)
chi.sq <- sum((observed - expected) ^ 2 / expected)
p.value <- 1 - pchisq(chi.sq, df = n.bins - 2)
rmse <- sqrt(mean((observed - expected) ^ 2))

if (simplif) return(list(chi.sq = chi.sq, p.value = p.value, RMSE = rmse))

# plotting loosely based on calibration.plot function in package PresenceAbsence
if (plot) {
N.total <- tapply(obs, bins\$prob.bin, length)  # N cases in each bin
N.presence <- tapply(obs, bins\$prob.bin, sum)  # N presences in each bin
Empty <- is.na(N.total)
N.total[is.na(N.total)] <- 0
N.presence[is.na(N.presence)] <- 0
OBS.proportion <- N.presence / N.total
OBS.proportion[Empty] <- NA
df1.low <- 2 * (N.total - N.presence + 1)
df2.low <- 2 * N.presence
df1.up <- 2 * (N.presence + 1)
df2.up <- 2 * (N.total - N.presence)
N.bins <- length(unique(bins\$prob.bin))  # fui eue
Lower <- rep(0, N.bins)
Upper <- rep(1, N.bins)
TF <- N.presence != 0
Lower[TF] <- N.presence[TF]/(N.presence[TF] + ((N.total[TF] - N.presence[TF] + 1) * qf(1 - alpha/2, df1.low[TF], df2.low[TF])))
Lower[Empty] <- NA
TF <- N.presence < N.total
Upper[TF] <- ((N.presence[TF] + 1) * qf(1 - alpha/2, df1.up[TF], df2.up[TF]))/(N.total[TF] - N.presence[TF] + ((N.presence[TF] + 1) * qf(1 - alpha/2, df1.up[TF], df2.up[TF])))
Upper[Empty] <- NA
plot(c(-0.05, 1.05), c(-0.05, 1.05), type = "n", xlab = xlab, ylab = ylab, ...)
bin.centers <- bins\$bins.table\$median.prob  # fui eue

if (plot.bin.size) {
text(bin.centers, Upper + 0.07, labels = N.total)
}

abline(a = 0, b = 1, lty = 2)
for (i in 1:N.bins) {
lines(rep(bin.centers[i], 2), c(Lower[i], Upper[i]))
}
points(bin.centers, OBS.proportion, pch = 20)

if (plot.values) {
text(1, 0.2, adj = 1, substitute(paste(HL == a), list(a = round(chi.sq, 1))))
if (p.value < 0.001) {
text(1, 0.1, adj = 1, substitute(paste(italic(p) < a), list(a = 0.001)))
} else {
text(1, 0.1, adj = 1, substitute(paste(italic(p) == a), list(a = round(p.value, 3))))
}
text(1, 0.0, adj = 1, substitute(paste(RMSE == a), list(a = round(rmse, 1))))
}  # end if plot values
}

BinPred <- tapply(pred, bins\$prob.bin, mean)
bins.table <- data.frame(BinCenter = bin.centers, NBin = N.total, BinObs = OBS.proportion, BinPred = BinPred, BinObsCIlower = Lower, BinObsCIupper = Upper)

return(list(bins.table = bins.table, chi.sq = chi.sq, DF = n.bins - 2, p.value = p.value, RMSE = rmse))
}```

[presented with Pretty R]

HLfit still needs some code simplification, but fixes have recently been applied to the getBins function so that it doesn’t fail when probability breakpoints are not unique. Additional model calibration measures are soon to be added to the package.

Usage examples:

HLfit(model = mymodel, bin.method = “prob.bins”, min.prob.interval = 0.1)

HLfit(obs= myobs, pred = mypred, bin.method = “n.bins”, n.bins = 10)

HLfit(obs= myobs, pred = mypred, bin.method = “quantiles”)

References:

Barbosa A.M., Brown J.A. & Real R. (2014) modEvA – an R package for model evaluation and analysis. R package, version 0.1.

Hosmer, D.W. & Lemeshow, S. (1980). A goodness-of-fit test for the multiple logistic regression model. Communications in Statistics, A10: 1043–1069

Jiménez-Valverde, A., Acevedo, P., Barbosa, A.M., Lobo, J.M. & Real, R. (2013). Discrimination capacity in species distribution models depends on the representativeness of the environmental domain. Global Ecology and Biogeography, 22: 508–516

Pearce, J. & Ferrier, S. (2000). Evaluating the Predictive Performance of Habitat Models Developed using Logistic Regression. Ecological Modeling, 133: 225 – 245