Model calibration with the Hosmer & Lemeshow goodness-of-fit statistic

The model evaluation functions posted here so far calculate only measures of discrimination capacity, i.e., how well the model is capable of distinguishing presences from absences (after the model’s continuous predictions of presence probability or alike are converted to binary predictions of presence or absence). However, there is another important facet of model evaluation: calibration or reliability, i.e., the relationship between predicted probability and observed prevalence (Pearce & Ferrier 2000; Jiménez-Valverde et al. 2013). The HLfit function, now included in the modEvA package (Barbosa et al. 2014), measures model reliability with the Hosmer & Lemeshow goodness-of-fit statistic (Hosmer & Lemeshow 1980). It also calculates the root mean square error (RMSE).

Note that these statistics have some limitations (see e.g. this post at Statistical Horizons), mainly due to the need to group the values into bins within which to compare probability and prevalence, and the influence of the binning method on the result. The HLfit function can use several binning methods, which are implemented in the getBins function provided with it (and also included in modEvA). You should therefore evaluate your model with different binning methods, to see if the results change significantly. The HLfit  function uses also the prevalence function.

bin.methods c("round.prob", "prob.bins", "size.bins", "n.bins", "quantiles")
 
getBins function(obs = NULL, pred = NULL, model = NULL, id = NULL, bin.method = "quantiles", n.bins = 10, fixed.bin.size = FALSE, min.bin.size = 15, min.prob.interval = 0.1, simplif = FALSE) {
 
  # version 3.4 (11 Sep 2014)
  # obs: a vector of observed presences (1) and absences (0) or another binary response variable
  # pred: a vector with the corresponding predicted values of presence probability, habitat suitability, environmental favourability or alike
  # model: instead of (and overriding) obs and pred, you can provide a model object of class "glm"
  # id: optional vector of row identifiers; must be of the same length and in the same order of 'obs' and 'pred' (or of the cases used to build 'model')
 
  if (!is.null(model)) {
    if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.")
    if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.")
    obs }  # end if model
 
  stopifnot(
    length(obs) == length(pred),
    !(NA %in% obs),
    !(NA %in% pred),
    obs %in% c(0, 1),
    pred >= 0,
    pred <= 1,
    is.null(id) | length(id) == length(pred),
    n.bins >= 2,
    min.bin.size >= 0,
    min.prob.interval > 0,
    min.prob.interval < 1
  )
 
  if(!(bin.method %in% modEvAmethods("getBins"))) stop("Invalid bin.method; type modEvAmethods('getBins') for available options.")
 
  N length(obs)
 
  # now get prob.bin (probability bin to which each point belongs):
  if (bin.method == "round.prob") {  # flaw: assymmetric (up to 11 bins, the first and last with half the prob range)
    message("Arguments n.bins and min.bin.size are ignored by this bin.method")
    prob.bin round(pred, digits = nchar(min.prob.interval) - 2)
  }  # end method round.prob
 
  if (bin.method == "prob.bins") {  # flaw: may generate empty or very small bins
    message("Arguments n.bins and min.bin.size are ignored by this bin.method")
    bin.cuts seq(from = 0, to = 1, by = min.prob.interval)
    prob.bin findInterval(pred, bin.cuts)
  }  # end method prob.bins
 
  if (bin.method == "size.bins") {
    message("Arguments n.bins and min.prob.interval are ignored by this bin.method")
    bin.method "n.bins"
    n.bins floor(N / min.bin.size)
    fixed.bin.size TRUE
  }  # end method size.bins
 
  if (bin.method == "n.bins") {   # note: bins do not have constant size (and some get less than min size)
    message("Argument min.prob.interval is ignored by this bin.method")
    if (fixed.bin.size) {
      prob.bin findInterval(pred, quantile(pred, probs = seq(from = 0, to = 1, by = 1 / n.bins)))
    } else {
      prob.bin cut(pred, n.bins)
    }
  }  # end method n.bins
 
  if (bin.method == "quantiles") {
    # quantile binning based on 'hosmerlem' function by Peter D. M. Macdonald (http://www.stat.sc.edu/~hitchcock/diseaseoutbreakRexample704.txt)
    cutpoints quantile(pred, probs = seq(0, 1, 1/n.bins))
    prob.bin findInterval(pred, cutpoints)
  }
 
  prob.bins sort(unique(prob.bin))  # lists the probability bins in the model
 
  bins.table t(as.data.frame(unclass(table(obs,prob.bin))))
  bins.table data.frame(rowSums(bins.table), bins.table[ , c(2, 1)])
  colnames(bins.table) c("N","N.pres","N.abs")
  bins.table$prevalence with(bins.table, N.pres / N)
  bins.table$mean.prob tapply(pred, prob.bin, mean)
  bins.table$median.prob tapply(pred, prob.bin, median)
  bins.table$difference with(bins.table, mean.prob - prevalence)
  bins.table$max.poss.diff ifelse(bins.table$mean.prob < 0.5, 1 - bins.table$mean.prob, bins.table$mean.prob)
  bins.table$adj.diff with(bins.table, abs(difference - max.poss.diff))
  bins.table$overpred apply(bins.table[ ,c("prevalence", "mean.prob")], 1, max)
  bins.table$underpred apply(bins.table[ ,c("prevalence", "mean.prob")], 1, min)
 
  bins.table [bins.table$N > 0, ]  # eliminates empty bins (which impede calculations)
 
  if(min(as.data.frame(bins.table)$N) < 15) warning("There is at least one bin with less than 15 values, for which comparisons may not be meaningful; consider using a bin.method that allows defining a minimum bin size")
  n.bins nrow(bins.table)
 
  return(list(prob.bin = prob.bin, bins.table = bins.table, N = N, n.bins = n.bins))
 
}
HLfit <- function (model = NULL, obs = NULL, pred = NULL, bin.method, n.bins = 10, fixed.bin.size = FALSE, min.bin.size = 15, min.prob.interval = 0.1, simplif = FALSE, alpha = 0.05, plot = TRUE, plot.values = TRUE, plot.bin.size = TRUE, xlab = "Predicted probability", ylab = "Observed prevalence", ...) {
  # version 1.5 (24 Jun 2015)
 
  if (!is.null(model)) {
    if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.")
    if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.")
    obs <- model$y
    pred <- model$fitted.values
  }  # end if model
 
  stopifnot(
    length(obs) == length(pred),
    obs %in% c(0, 1),
    pred >= 0,
    pred <= 1
  )
 
  bins <- getBins(obs = obs, pred = pred, bin.method = bin.method, n.bins = n.bins, fixed.bin.size = fixed.bin.size, min.bin.size = min.bin.size, min.prob.interval = min.prob.interval)
  n.bins <- nrow(bins$bins.table)
 
  # next 4 lines: adapted from hosmerlem function in http://www.stat.sc.edu/~hitchcock/diseaseoutbreakRexample704.txt
  observed <- xtabs(cbind(1 - obs, obs) ~ bins$prob.bin)
  expected <- xtabs(cbind(1 - pred, pred) ~ bins$prob.bin)
  chi.sq <- sum((observed - expected) ^ 2 / expected)
  p.value <- 1 - pchisq(chi.sq, df = n.bins - 2)
  rmse <- sqrt(mean((observed - expected) ^ 2))
 
  if (simplif) return(list(chi.sq = chi.sq, p.value = p.value, RMSE = rmse))
 
  # plotting loosely based on calibration.plot function in package PresenceAbsence
  if (plot) {
    N.total <- tapply(obs, bins$prob.bin, length)  # N cases in each bin
    N.presence <- tapply(obs, bins$prob.bin, sum)  # N presences in each bin
    Empty <- is.na(N.total)
    N.total[is.na(N.total)] <- 0
    N.presence[is.na(N.presence)] <- 0
    OBS.proportion <- N.presence / N.total
    OBS.proportion[Empty] <- NA
    df1.low <- 2 * (N.total - N.presence + 1)
    df2.low <- 2 * N.presence
    df1.up <- 2 * (N.presence + 1)
    df2.up <- 2 * (N.total - N.presence)
    N.bins <- length(unique(bins$prob.bin))  # fui eue
    Lower <- rep(0, N.bins)
    Upper <- rep(1, N.bins)
    TF <- N.presence != 0
    Lower[TF] <- N.presence[TF]/(N.presence[TF] + ((N.total[TF] - N.presence[TF] + 1) * qf(1 - alpha/2, df1.low[TF], df2.low[TF])))
    Lower[Empty] <- NA
    TF <- N.presence < N.total
    Upper[TF] <- ((N.presence[TF] + 1) * qf(1 - alpha/2, df1.up[TF], df2.up[TF]))/(N.total[TF] - N.presence[TF] + ((N.presence[TF] + 1) * qf(1 - alpha/2, df1.up[TF], df2.up[TF])))
    Upper[Empty] <- NA
    plot(c(-0.05, 1.05), c(-0.05, 1.05), type = "n", xlab = xlab, ylab = ylab, ...)
    bin.centers <- bins$bins.table$median.prob  # fui eue
 
    if (plot.bin.size) {
      text(bin.centers, Upper + 0.07, labels = N.total)
    }
 
    abline(a = 0, b = 1, lty = 2)
    for (i in 1:N.bins) {
      lines(rep(bin.centers[i], 2), c(Lower[i], Upper[i]))
    }
    points(bin.centers, OBS.proportion, pch = 20)
 
    if (plot.values) {
      text(1, 0.2, adj = 1, substitute(paste(HL == a), list(a = round(chi.sq, 1))))
      if (p.value < 0.001) {
        text(1, 0.1, adj = 1, substitute(paste(italic(p) < a), list(a = 0.001)))
      } else {
        text(1, 0.1, adj = 1, substitute(paste(italic(p) == a), list(a = round(p.value, 3))))
      }
      text(1, 0.0, adj = 1, substitute(paste(RMSE == a), list(a = round(rmse, 1))))
    }  # end if plot values
  }
 
  BinPred <- tapply(pred, bins$prob.bin, mean)
  bins.table <- data.frame(BinCenter = bin.centers, NBin = N.total, BinObs = OBS.proportion, BinPred = BinPred, BinObsCIlower = Lower, BinObsCIupper = Upper)
 
  return(list(bins.table = bins.table, chi.sq = chi.sq, DF = n.bins - 2, p.value = p.value, RMSE = rmse))
}

[presented with Pretty R]

HLfit still needs some code simplification, but fixes have recently been applied to the getBins function so that it doesn’t fail when probability breakpoints are not unique. Additional model calibration measures are soon to be added to the package.

Usage examples:

HLfit(model = mymodel, bin.method = “prob.bins”, min.prob.interval = 0.1)

HLfit(obs= myobs, pred = mypred, bin.method = “n.bins”, n.bins = 10)

HLfit(obs= myobs, pred = mypred, bin.method = “quantiles”)

References:

Barbosa A.M., Brown J.A. & Real R. (2014) modEvA – an R package for model evaluation and analysis. R package, version 0.1.

Hosmer, D.W. & Lemeshow, S. (1980). A goodness-of-fit test for the multiple logistic regression model. Communications in Statistics, A10: 1043–1069

Jiménez-Valverde, A., Acevedo, P., Barbosa, A.M., Lobo, J.M. & Real, R. (2013). Discrimination capacity in species distribution models depends on the representativeness of the environmental domain. Global Ecology and Biogeography, 22: 508–516

Pearce, J. & Ferrier, S. (2000). Evaluating the Predictive Performance of Habitat Models Developed using Logistic Regression. Ecological Modeling, 133: 225 – 245

Advertisements

Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s