Linear models come with an R-squared value that measures the proportion of variation that the model accounts for. The R-squared is provided with *summary(model)* in R. For generalized linear models (**GLMs**), the equivalent is the **amount of deviance accounted for **(**D-squared**; Guisan & Zimmermann 2000), but this value is not normally provided with the model summary. The *Dsquared* function, now included in the modEvA package (Barbosa et al. 2014), calculates it. There is also an option to calculate the **adjusted D-squared**, which takes into account the number of observations and the number of model parameters, thus allowing **direct comparison among different models** (Weisberg 1980, Guisan & Zimmermann 2000). **UPDATE: see also further measures in the new RsqGLM function**.

Dsquared <- function(model = NULL, obs = NULL, pred = NULL, family = NULL, # needed only when 'model' not provided adjust = FALSE, npar = NULL) { # needed only when 'model' not provided # version 1.4 (31 Aug 2015) model.provided <- ifelse(is.null(model), FALSE, TRUE) if (model.provided) { if (!("glm" %in% class(model))) stop ("'model' must be of class 'glm'.") if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.") if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.") obs <- model$y pred <- model$fitted.values } else { # if model not provided if (is.null(obs) | is.null(pred)) stop ("You must provide either 'obs' and 'pred', or a 'model' object of class 'glm'.") if (length(obs) != length(pred)) stop ("'obs' and 'pred' must be of the same length (and in the same order).") if (is.null(family)) stop ("With 'obs' and 'pred' arguments (rather than a model object), you must also specify one of two model family options: 'binomial' or 'poisson' (in quotes).") else if (!is.character(family)) stop ("Argument 'family' must be provided as character (i.e. in quotes: 'binomial' or 'poisson').") else if (length(family) != 1 | !(family %in% c("binomial", "poisson"))) stop ("'family' must be either 'binomial' or 'poisson' (in quotes).") if (family == "binomial") { if (any(!(obs %in% c(0, 1)) | pred < 0 | pred > 1)) stop ("'binomial' family implies that 'obs' data should be binary (with values 0 or 1) and 'pred' data should be bounded between 0 and 1.") link <- log(pred / (1 - pred)) # logit } # end if binomial else if (family == "poisson") { if (any(obs %%1 != 0)) stop ("'poisson' family implies that 'obs' data should consist of whole numbers.") link <- log(pred) } # end if poisson model <- glm(obs ~ link, family = family) } # end if model not provided D2 <- (model$null.deviance - model$deviance) / model$null.deviance if (adjust) { if (model.provided) { n <- length(model$y) #p <- length(model$coefficients) p <- attributes(logLik(model))$df } else { if (is.null(npar)) stop ("Adjusted D-squared from 'obs' and 'pred' values (rather than a model object) requires specifying the number of parameters in the underlying model ('npar').") n <- length(na.omit(obs)) p <- npar } # end if model.provided else D2 <- 1 - ((n - 1) / (n - p)) * (1 - D2) } # end if adjust return (D2) }

Usage examples:

Dsquared(model = mymodel)

Dsquared(model = mymodel, adjust = TRUE)

Dsquared(obs = myobs, pred= mypred, family = “poisson”)

Dsquared(obs = myobs, pred = mypred, family = “poisson”, adjust = TRUE, npar = 5)

**References**

Barbosa A.M., Brown J.A. & Real R. (2014) modEvA – an R package for model evaluation and analysis. R package, version 0.1.

Guisan, A. & Zimmermann, N.E. (2000). Predictive habitat distribution models in ecology. *Ecological Modelling* 135: 147-186

Weisberg, S. (1980). *Applied Linear Regression*. Wiley, New York

Hi! I’m trying to wrap my head around pseudo-R^2 metrics like the D^2 above. My understanding ,as of now, is that this function could be used to compare multiple models using the same dataset. (e.g. greater D^2 is better). But- and this is the real question- can I say how good my model fits the data with this statistic? The OLS R^2 we know and love is often talked about this manor. ( E.g. R^2 = 0.6, so 60% of variability in our data is explained by our model.) Is it inappropriate to apply D^2 to glm output the same way as R^2 and make a claim about how good our model fits the data?

Sorry, not a statistician, so I can’t really offer expert opinion on this. My personal understanding is that D^2 does measure model fit and is indeed useful for comparing models (even for different datasets if you use adj=TRUE), although it cannot be interpreted like the R^2 exactly. The D^2 is not the proportion of variance (in the response) that the model accounts for, but the proportion of deviance (lack of fit) that the model has less than the deviance of the null model (i.e., than a model with no variables for the same data). I don’t think that D^2 is better or worse than any of the other “pseudo-R^2” metrics available (e.g. https://modtools.wordpress.com/2014/10/30/rsqglm/) — as I understand, none of them is perfect, so it’s probably good to always look at a few different metrics. Maybe check out this article addressing Measures of Fit for Logistic Regression: https://support.sas.com/resources/papers/proceedings14/1485-2014.pdf

Thank you, that was a helpful answer. A follow up question: Does the RsqGLM function work for Poisson/Negative binomial GLMs or just logistic? I’ve been working with Negative Binomial models and had made the assumption it was ok to apply your package to these models.

Except for the Tjur R-squared, which requires a binomial response variable (and returns NA if this isn’t the case), all other implemented pseudo-R-squareds are based on deviance / null deviance or on log likelihood. In practice, the RsqGLM function works for any model of class “glm” (which includes Poisson and negative binomial models). However, like I said, I’m really not a stats expert, so I can’t be 100% sure that these coefficients actually apply to Poisson/NB and are directly interpretable in the same way as for binomial logistic regression. I would contact a good statistician about this.

Pingback: Sensitive numbers | ConservationBytes.com

The modEVa package does not show up in the R library of package 😥

modEvA is not (yet) on CRAN, but rather on R-Forge, so it must be installed from there – see https://modtools.wordpress.com/packages/modeva, or the tutorial just made available at http://modeva.r-forge.r-project.org/modEvA-tutorial.html

UPDATE: the modEvA package was recently added to CRAN, so it should now be installable with the usual procedures

I am trying to use this function to get the dsquared of a gaussian glm but I get this error Dsquared(model=res)

Error: obs %in% c(0, 1) are not all TRUE

res is

Call: glm(formula = tl ~ poly(bt0011sme, 2) + poly(bt0011yme, 2) +

poly(st0011sme, 2) + poly(st0011yme, 2) + poly(bs0011sme,

2) + poly(predicted, 2), family = gaussian, data = s)

I have R 3.1.0 but it shouldn’t make a difference should it?

Hello,

Try copying the function from the blog and pasting it in your R console (and not loading the modEvA package after it). The version of Dsquared currently in modEvA only allows binomial GLMs; the one here in the blog should work for any GLM, although I’m still trying to confirm that this is correct before updating modEvA. Anyway, if you’re using Gaussian, you can just do a LM instead of GLM and check the R-squared in the model summary.

Cheers,

thank you!