# R-squared measures for generalized linear models

There are several ways of calculating (pseudo) R-squared values for logistic regression models, with no consensus about which is best. The RsqGLM function, now included in the modEvA package, calculates those of McFadden (1974), Cox & Snell (1989), Nagelkerke (1991), Tjur (2009), and the squared Pearson correlation between observed and predicted values:

```RsqGLM <- function(obs = NULL, pred = NULL, model = NULL) {
# version 1.2 (3 Jan 2015)

model.provided <- ifelse(is.null(model), FALSE, TRUE)

if (model.provided) {
if (!("glm" %in% class(model))) stop ("'model' must be of class 'glm'.")
if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.")
if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.")
obs <- model\$y
pred <- model\$fitted.values

} else { # if model not provided
if (is.null(obs) | is.null(pred)) stop ("You must provide either 'obs' and 'pred', or a 'model' object of class 'glm'")
if (length(obs) != length(pred)) stop ("'obs' and 'pred' must be of the same length (and in the same order).")
if (!(obs %in% c(0, 1)) | pred < 0 | pred > 1) stop ("Sorry, 'obs' and 'pred' options currently only implemented for binomial GLMs (binary response variable with values 0 or 1) with logit link.")
logit <- log(pred / (1 - pred))
model <- glm(obs ~ logit, family = "binomial")
}

null.mod <- glm(obs ~ 1, family = family(model))
loglike.M <- as.numeric(logLik(model))
loglike.0 <- as.numeric(logLik(null.mod))
N <- length(obs)

# based on Nagelkerke 1991:
CoxSnell <- 1 - exp(-(2 / N) * (loglike.M - loglike.0))
Nagelkerke <- CoxSnell / (1 - exp((2 * N ^ (-1)) * loglike.0))

# based on Allison 2014:
McFadden <- 1 - (loglike.M / loglike.0)
Tjur <- mean(pred[obs == 1]) - mean(pred[obs == 0])
sqPearson <- cor(obs, pred) ^ 2

return(list(CoxSnell = CoxSnell, Nagelkerke = Nagelkerke, McFadden = McFadden, Tjur = Tjur, sqPearson = sqPearson))
}```

[presented with Pretty R]

Input data can be either a glm model object or two vectors of observed binary (0 or 1) values and the corresponding predicted probabilities (only binomial-logit GLMs admitted in this case). The output is a named list of the calculated R-squared values. See also the Dsquared function.

REFERENCES

Cox, D.R. & Snell E.J. (1989) The Analysis of Binary Data, 2nd ed. Chapman and Hall, London

McFadden, D. (1974) Conditional logit analysis of qualitative choice behavior. In: Zarembka P. (ed.) Frontiers in Economics. Academic Press, New York

Nagelkerke, N.J.D. (1991) A note on a general definition of the coefficient of determination. Biometrika, 78: 691-692

Tjur T. (2009) Coefficients of determination in logistic regression models – a new proposal: the coefficient of discrimination. The American Statistician, 63: 366-372.