Calculate the amount of deviance explained by a GLM

Linear models come with an R-squared value that measures the proportion of variation that the model accounts for. The R-squared is provided with summary(model) in R. For generalized linear models (GLMs), the equivalent is the amount of deviance accounted for (D-squared; Guisan & Zimmermann 2000), but this value is not normally provided with the model summary. The Dsquared function, now included in the modEvA package (Barbosa et al. 2014), calculates it. There is also an option to calculate the adjusted D-squared, which takes into account the number of observations and the number of model parameters, thus allowing direct comparison among different models (Weisberg 1980, Guisan & Zimmermann 2000). UPDATE: see also further measures in the new RsqGLM function.

Dsquared <- function(model = NULL, 
                     obs = NULL, 
                     pred = NULL, 
                     family = NULL, # needed only when 'model' not provided
                     adjust = FALSE, 
                     npar = NULL) { # needed only when 'model' not provided
  # version 1.4 (31 Aug 2015)
 
  model.provided <- ifelse(is.null(model), FALSE, TRUE)
 
  if (model.provided) {
    if (!("glm" %in% class(model))) stop ("'model' must be of class 'glm'.")
    if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.")
    if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.")
    obs <- model$y
    pred <- model$fitted.values
 
  } else { # if model not provided
    if (is.null(obs) | is.null(pred)) stop ("You must provide either 'obs' and 'pred', or a 'model' object of class 'glm'.")
    if (length(obs) != length(pred)) stop ("'obs' and 'pred' must be of the same length (and in the same order).")
    if (is.null(family)) stop ("With 'obs' and 'pred' arguments (rather than a model object), you must also specify one of two model family options: 'binomial' or 'poisson' (in quotes).")
    else if (!is.character(family)) stop ("Argument 'family' must be provided as character (i.e. in quotes: 'binomial' or 'poisson').")
    else if (length(family) != 1 | !(family %in% c("binomial", "poisson"))) stop ("'family' must be either 'binomial' or 'poisson' (in quotes).")
 
    if (family == "binomial") {
      if (any(!(obs %in% c(0, 1)) | pred < 0 | pred > 1)) stop ("'binomial' family implies that 'obs' data should be binary (with values 0 or 1) and 'pred' data should be bounded between 0 and 1.")
      link <- log(pred / (1 - pred))  # logit
    }  # end if binomial
 
    else if (family == "poisson") {
      if (any(obs %%1 != 0)) stop ("'poisson' family implies that 'obs' data should consist of whole numbers.")
      link <- log(pred)
    }  # end if poisson
 
    model <- glm(obs ~ link, family = family)
  }  # end if model not provided
 
  D2 <- (model$null.deviance - model$deviance) / model$null.deviance
 
  if (adjust) {
    if (model.provided) {
      n <- length(model$y)
      #p <- length(model$coefficients)
      p <- attributes(logLik(model))$df
    } else {
      if (is.null(npar)) stop ("Adjusted D-squared from 'obs' and 'pred' values (rather than a model object) requires specifying the number of parameters in the underlying model ('npar').")
      n <- length(na.omit(obs))
      p <- npar
    }  # end if model.provided else
 
    D2 <- 1 - ((n - 1) / (n - p)) * (1 - D2)
  }  # end if adjust
 
  return (D2)
}

[presented with Pretty R]

Usage examples:

Dsquared(model = mymodel)

Dsquared(model = mymodel, adjust = TRUE)

Dsquared(obs = myobs, pred= mypred, family = “poisson”)

Dsquared(obs = myobs, pred = mypred, family = “poisson”, adjust = TRUE, npar = 5)

References

Barbosa A.M., Brown J.A. & Real R. (2014) modEvA – an R package for model evaluation and analysis. R package, version 0.1.

Guisan, A. & Zimmermann, N.E. (2000). Predictive habitat distribution models in ecology. Ecological Modelling 135: 147-186

Weisberg, S. (1980). Applied Linear Regression. Wiley, New York

13 thoughts on “Calculate the amount of deviance explained by a GLM”

Any chance that I can get Deviance from lmer models?

Reply ↓

A.M. Barbosa on March 17, 2017 at 5:19 pm said:

I’m not very familiar with lmer models, so I can’t be sure. Try the ‘Dsquared’ function, either provinding ‘model’ (if it’s an object of class ‘glm’), or providing ‘obs’ and ‘pred’, and see what happens. But I’d check elsewhere (or with a statistician) if the results are correct or if the D-squared formula can be applied in the same way to lmer.

Reply ↓

Hi! I’m trying to wrap my head around pseudo-R^2 metrics like the D^2 above. My understanding ,as of now, is that this function could be used to compare multiple models using the same dataset. (e.g. greater D^2 is better). But- and this is the real question- can I say how good my model fits the data with this statistic? The OLS R^2 we know and love is often talked about this manor. ( E.g. R^2 = 0.6, so 60% of variability in our data is explained by our model.) Is it inappropriate to apply D^2 to glm output the same way as R^2 and make a claim about how good our model fits the data?

Reply ↓

A.M. Barbosa on October 20, 2016 at 5:16 pm said:

Sorry, not a statistician, so I can’t really offer expert opinion on this. My personal understanding is that D^2 does measure model fit and is indeed useful for comparing models (even for different datasets if you use adj=TRUE), although it cannot be interpreted like the R^2 exactly. The D^2 is not the proportion of variance (in the response) that the model accounts for, but the proportion of deviance (lack of fit) that the model has less than the deviance of the null model (i.e., than a model with no variables for the same data). I don’t think that D^2 is better or worse than any of the other “pseudo-R^2” metrics available (e.g. https://modtools.wordpress.com/2014/10/30/rsqglm/) — as I understand, none of them is perfect, so it’s probably good to always look at a few different metrics. Maybe check out this article addressing Measures of Fit for Logistic Regression: https://support.sas.com/resources/papers/proceedings14/1485-2014.pdf

Reply ↓
- Ryan on October 20, 2016 at 8:30 pm said:
  
  Thank you, that was a helpful answer. A follow up question: Does the RsqGLM function work for Poisson/Negative binomial GLMs or just logistic? I’ve been working with Negative Binomial models and had made the assumption it was ok to apply your package to these models.
  
  Reply ↓
  - A.M. Barbosa on October 25, 2016 at 12:17 pm said:
    
    Except for the Tjur R-squared, which requires a binomial response variable (and returns NA if this isn’t the case), all other implemented pseudo-R-squareds are based on deviance / null deviance or on log likelihood. In practice, the RsqGLM function works for any model of class “glm” (which includes Poisson and negative binomial models). However, like I said, I’m really not a stats expert, so I can’t be 100% sure that these coefficients actually apply to Poisson/NB and are directly interpretable in the same way as for binomial logistic regression. I would contact a good statistician about this.
    
    Reply ↓

Pingback: Sensitive numbers | ConservationBytes.com

The modEVa package does not show up in the R library of package 😥

Reply ↓

A.M. Barbosa on March 9, 2015 at 4:23 pm said:

modEvA is not (yet) on CRAN, but rather on R-Forge, so it must be installed from there – see https://modtools.wordpress.com/packages/modeva, or the tutorial just made available at http://modeva.r-forge.r-project.org/modEvA-tutorial.html

Reply ↓
A.M. Barbosa on October 20, 2016 at 5:19 pm said:

UPDATE: the modEvA package was recently added to CRAN, so it should now be installable with the usual procedures

Reply ↓

I am trying to use this function to get the dsquared of a gaussian glm but I get this error Dsquared(model=res)
Error: obs %in% c(0, 1) are not all TRUE

res is

Call: glm(formula = tl ~ poly(bt0011sme, 2) + poly(bt0011yme, 2) +
poly(st0011sme, 2) + poly(st0011yme, 2) + poly(bs0011sme,
2) + poly(predicted, 2), family = gaussian, data = s)

I have R 3.1.0 but it shouldn’t make a difference should it?

Reply ↓

A.M. Barbosa on January 29, 2015 at 3:03 pm said:

Hello,

Try copying the function from the blog and pasting it in your R console (and not loading the modEvA package after it). The version of Dsquared currently in modEvA only allows binomial GLMs; the one here in the blog should work for any GLM, although I’m still trying to confirm that this is correct before updating modEvA. Anyway, if you’re using Gaussian, you can just do a LM instead of GLM and check the R-squared in the model summary.

Cheers,

Reply ↓

thank you!

Reply ↓

Flo on March 17, 2017 at 11:01 am said:

Any chance that I can get Deviance from lmer models?

Reply ↓
- A.M. Barbosa on March 17, 2017 at 5:19 pm said:
  
  I’m not very familiar with lmer models, so I can’t be sure. Try the ‘Dsquared’ function, either provinding ‘model’ (if it’s an object of class ‘glm’), or providing ‘obs’ and ‘pred’, and see what happens. But I’d check elsewhere (or with a statistician) if the results are correct or if the D-squared formula can be applied in the same way to lmer.
  
  Reply ↓
Ryan on October 20, 2016 at 5:10 am said:

Hi! I’m trying to wrap my head around pseudo-R^2 metrics like the D^2 above. My understanding ,as of now, is that this function could be used to compare multiple models using the same dataset. (e.g. greater D^2 is better). But- and this is the real question- can I say how good my model fits the data with this statistic? The OLS R^2 we know and love is often talked about this manor. ( E.g. R^2 = 0.6, so 60% of variability in our data is explained by our model.) Is it inappropriate to apply D^2 to glm output the same way as R^2 and make a claim about how good our model fits the data?

Reply ↓
- A.M. Barbosa on October 20, 2016 at 5:16 pm said:
  
  Sorry, not a statistician, so I can’t really offer expert opinion on this. My personal understanding is that D^2 does measure model fit and is indeed useful for comparing models (even for different datasets if you use adj=TRUE), although it cannot be interpreted like the R^2 exactly. The D^2 is not the proportion of variance (in the response) that the model accounts for, but the proportion of deviance (lack of fit) that the model has less than the deviance of the null model (i.e., than a model with no variables for the same data). I don’t think that D^2 is better or worse than any of the other “pseudo-R^2” metrics available (e.g. https://modtools.wordpress.com/2014/10/30/rsqglm/) — as I understand, none of them is perfect, so it’s probably good to always look at a few different metrics. Maybe check out this article addressing Measures of Fit for Logistic Regression: https://support.sas.com/resources/papers/proceedings14/1485-2014.pdf
  
  Reply ↓
  - Ryan on October 20, 2016 at 8:30 pm said:
    
    Thank you, that was a helpful answer. A follow up question: Does the RsqGLM function work for Poisson/Negative binomial GLMs or just logistic? I’ve been working with Negative Binomial models and had made the assumption it was ok to apply your package to these models.
    
    Reply ↓
    - A.M. Barbosa on October 25, 2016 at 12:17 pm said:
      
      Except for the Tjur R-squared, which requires a binomial response variable (and returns NA if this isn’t the case), all other implemented pseudo-R-squareds are based on deviance / null deviance or on log likelihood. In practice, the RsqGLM function works for any model of class “glm” (which includes Poisson and negative binomial models). However, like I said, I’m really not a stats expert, so I can’t be 100% sure that these coefficients actually apply to Poisson/NB and are directly interpretable in the same way as for binomial logistic regression. I would contact a good statistician about this.
      
      Reply ↓
Pingback: Sensitive numbers | ConservationBytes.com
pmehat on March 9, 2015 at 3:58 pm said:

The modEVa package does not show up in the R library of package 😥

Reply ↓
- A.M. Barbosa on March 9, 2015 at 4:23 pm said:
  
  modEvA is not (yet) on CRAN, but rather on R-Forge, so it must be installed from there – see https://modtools.wordpress.com/packages/modeva, or the tutorial just made available at http://modeva.r-forge.r-project.org/modEvA-tutorial.html
  
  Reply ↓
- A.M. Barbosa on October 20, 2016 at 5:19 pm said:
  
  UPDATE: the modEvA package was recently added to CRAN, so it should now be installable with the usual procedures
  
  Reply ↓
Anonymous on January 27, 2015 at 10:15 pm said:

I am trying to use this function to get the dsquared of a gaussian glm but I get this error Dsquared(model=res)
Error: obs %in% c(0, 1) are not all TRUE

res is

Call: glm(formula = tl ~ poly(bt0011sme, 2) + poly(bt0011yme, 2) +
poly(st0011sme, 2) + poly(st0011yme, 2) + poly(bs0011sme,
2) + poly(predicted, 2), family = gaussian, data = s)

I have R 3.1.0 but it shouldn’t make a difference should it?

Reply ↓
- A.M. Barbosa on January 29, 2015 at 3:03 pm said:
  
  Hello,
  
  Try copying the function from the blog and pasting it in your R console (and not loading the modEvA package after it). The version of Dsquared currently in modEvA only allows binomial GLMs; the one here in the blog should work for any GLM, although I’m still trying to confirm that this is correct before updating modEvA. Anyway, if you’re using Gaussian, you can just do a LM instead of GLM and check the R-squared in the model summary.
  
  Cheers,
  
  Reply ↓
Anonymous on March 27, 2014 at 9:03 pm said:

thank you!

Reply ↓

modTools

simple tools for simpler modelling

Calculate the amount of deviance explained by a GLM

13 thoughts on “Calculate the amount of deviance explained by a GLM”

Comment Cancel reply

Share this:

Related

13 thoughts on “Calculate the amount of deviance explained by a GLM”

Comment Cancel reply