Calculate the amount of deviance explained by a GLM

Linear models come with an R-squared value that measures the proportion of variation that the model accounts for. The R-squared is provided with summary(model) in R. For generalized linear models (GLMs), the equivalent is the amount of deviance accounted for (D-squared; Guisan & Zimmermann 2000), but this value is not normally provided with the model summary. The Dsquared function, now included in the modEvA package (Barbosa et al. 2014), calculates it. There is also an option to calculate the adjusted D-squared, which takes into account the number of observations and the number of model parameters, thus allowing direct comparison among different models (Weisberg 1980, Guisan & Zimmermann 2000). UPDATE: see also further measures in the new RsqGLM function.

Dsquared <- function(model = NULL, 
                     obs = NULL, 
                     pred = NULL, 
                     family = NULL, # needed only when 'model' not provided
                     adjust = FALSE, 
                     npar = NULL) { # needed only when 'model' not provided
  # version 1.4 (31 Aug 2015)
 
  model.provided <- ifelse(is.null(model), FALSE, TRUE)
 
  if (model.provided) {
    if (!("glm" %in% class(model))) stop ("'model' must be of class 'glm'.")
    if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.")
    if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.")
    obs <- model$y
    pred <- model$fitted.values
 
  } else { # if model not provided
    if (is.null(obs) | is.null(pred)) stop ("You must provide either 'obs' and 'pred', or a 'model' object of class 'glm'.")
    if (length(obs) != length(pred)) stop ("'obs' and 'pred' must be of the same length (and in the same order).")
    if (is.null(family)) stop ("With 'obs' and 'pred' arguments (rather than a model object), you must also specify one of two model family options: 'binomial' or 'poisson' (in quotes).")
    else if (!is.character(family)) stop ("Argument 'family' must be provided as character (i.e. in quotes: 'binomial' or 'poisson').")
    else if (length(family) != 1 | !(family %in% c("binomial", "poisson"))) stop ("'family' must be either 'binomial' or 'poisson' (in quotes).")
 
    if (family == "binomial") {
      if (any(!(obs %in% c(0, 1)) | pred < 0 | pred > 1)) stop ("'binomial' family implies that 'obs' data should be binary (with values 0 or 1) and 'pred' data should be bounded between 0 and 1.")
      link <- log(pred / (1 - pred))  # logit
    }  # end if binomial
 
    else if (family == "poisson") {
      if (any(obs %%1 != 0)) stop ("'poisson' family implies that 'obs' data should consist of whole numbers.")
      link <- log(pred)
    }  # end if poisson
 
    model <- glm(obs ~ link, family = family)
  }  # end if model not provided
 
  D2 <- (model$null.deviance - model$deviance) / model$null.deviance
 
  if (adjust) {
    if (model.provided) {
      n <- length(model$y)
      #p <- length(model$coefficients)
      p <- attributes(logLik(model))$df
    } else {
      if (is.null(npar)) stop ("Adjusted D-squared from 'obs' and 'pred' values (rather than a model object) requires specifying the number of parameters in the underlying model ('npar').")
      n <- length(na.omit(obs))
      p <- npar
    }  # end if model.provided else
 
    D2 <- 1 - ((n - 1) / (n - p)) * (1 - D2)
  }  # end if adjust
 
  return (D2)
}

[presented with Pretty R]

Usage examples:

Dsquared(model = mymodel)

Dsquared(model = mymodel, adjust = TRUE)

Dsquared(obs = myobs, pred= mypred, family = “poisson”)

Dsquared(obs = myobs, pred = mypred, family = “poisson”, adjust = TRUE, npar = 5)

References

Barbosa A.M., Brown J.A. & Real R. (2014) modEvA – an R package for model evaluation and analysis. R package, version 0.1.

Guisan, A. & Zimmermann, N.E. (2000). Predictive habitat distribution models in ecology. Ecological Modelling 135: 147-186

Weisberg, S. (1980). Applied Linear Regression. Wiley, New York

Advertisements

11 thoughts on “Calculate the amount of deviance explained by a GLM

  1. Hi! I’m trying to wrap my head around pseudo-R^2 metrics like the D^2 above. My understanding ,as of now, is that this function could be used to compare multiple models using the same dataset. (e.g. greater D^2 is better). But- and this is the real question- can I say how good my model fits the data with this statistic? The OLS R^2 we know and love is often talked about this manor. ( E.g. R^2 = 0.6, so 60% of variability in our data is explained by our model.) Is it inappropriate to apply D^2 to glm output the same way as R^2 and make a claim about how good our model fits the data?

    • Sorry, not a statistician, so I can’t really offer expert opinion on this. My personal understanding is that D^2 does measure model fit and is indeed useful for comparing models (even for different datasets if you use adj=TRUE), although it cannot be interpreted like the R^2 exactly. The D^2 is not the proportion of variance (in the response) that the model accounts for, but the proportion of deviance (lack of fit) that the model has less than the deviance of the null model (i.e., than a model with no variables for the same data). I don’t think that D^2 is better or worse than any of the other “pseudo-R^2” metrics available (e.g. https://modtools.wordpress.com/2014/10/30/rsqglm/) — as I understand, none of them is perfect, so it’s probably good to always look at a few different metrics. Maybe check out this article addressing Measures of Fit for Logistic Regression: https://support.sas.com/resources/papers/proceedings14/1485-2014.pdf

      • Thank you, that was a helpful answer. A follow up question: Does the RsqGLM function work for Poisson/Negative binomial GLMs or just logistic? I’ve been working with Negative Binomial models and had made the assumption it was ok to apply your package to these models.

        • Except for the Tjur R-squared, which requires a binomial response variable (and returns NA if this isn’t the case), all other implemented pseudo-R-squareds are based on deviance / null deviance or on log likelihood. In practice, the RsqGLM function works for any model of class “glm” (which includes Poisson and negative binomial models). However, like I said, I’m really not a stats expert, so I can’t be 100% sure that these coefficients actually apply to Poisson/NB and are directly interpretable in the same way as for binomial logistic regression. I would contact a good statistician about this.

  2. Pingback: Sensitive numbers | ConservationBytes.com

  3. I am trying to use this function to get the dsquared of a gaussian glm but I get this error Dsquared(model=res)
    Error: obs %in% c(0, 1) are not all TRUE

    res is

    Call: glm(formula = tl ~ poly(bt0011sme, 2) + poly(bt0011yme, 2) +
    poly(st0011sme, 2) + poly(st0011yme, 2) + poly(bs0011sme,
    2) + poly(predicted, 2), family = gaussian, data = s)

    I have R 3.1.0 but it shouldn’t make a difference should it?

    • Hello,

      Try copying the function from the blog and pasting it in your R console (and not loading the modEvA package after it). The version of Dsquared currently in modEvA only allows binomial GLMs; the one here in the blog should work for any GLM, although I’m still trying to confirm that this is correct before updating modEvA. Anyway, if you’re using Gaussian, you can just do a LM instead of GLM and check the R-squared in the model summary.

      Cheers,

Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s