Get the equation of a model to apply elsewhere

The summary of a model in R gives you a table of the coefficient estimates and other parameters. Sometimes it may be useful to have a string of text with the model’s equation, so that you can present it in an article (e.g. Real et al. 2005) or apply it in a (raster map) calculation, either in R (although here you can usually use the ‘predict’ function) or in a GIS software (e.g. Barbosa et al. 2010). The getModEqn function, now included in the modEvA package (Barbosa et al. 2014), gets this equation for linear or generalized linear models.

By default it prints the “y” linear equation, but for generalized linear models you can also set type = “P” (for the equation of probability) or type = “F” (for favourability, which corrects the intercept to eliminate the effect of prevalence — see Real et al. 2006).

If the variables to which you want to apply the model have a prefix or suffix (e.g. prefix = “raster.stack$” for the R raster package, or prefix = “mydata$” for a data frame, or suffix = “@1” in Quantum GIS, or suffix = “@mapset” in GRASS), you can get these in the equation too, using the prefix and/or the suffix arguments.

getModEqn <- function(model, type = "y", digits = NULL, prefix = NULL, suffix = NULL) {
  # version 1.6 (14 Aug 2013)
  # model: a model object of class 'lm' or glm'
  # type: the type pf equation to present; can be "y" (for the linear logit equation), "P" (for predicted probability) or "F" (for favourability, which corrects the intercept to eliminate the effect of prevalence - Real et al. 2006)
  # digits: number of decimal places to which to round the model coefficients; if NULL, no rounding is done
  # prefix, suffix: text to add to the variables' names in the resulting equation (if your variables have a prefix or suffix added to the variable's name where you want to apply the equation)

  stopifnot(class(model) %in% c("lm", "glm"))
  if(length(type) != 1 | !(type %in% c("y", "P", "F"))) stop("'type' must be either 'y', 'P', or 'F'")
  if(!("glm" %in% class(model)) & type != "y") {
    message("types 'P' and 'F' are only applicable to models of class 'glm', so type was reset to 'y'")
    type <- "y"
  }
  coeffs <- summary(model)$coefficients[ , 1]
  if (type == "F" & !("(Intercept)") %in% names(coeffs)) {
    message("'F' is currently only applicable to models with an intercept, so type was set to 'P'")
    type <- "P"
  }
  if(type == "F") {
    n1 <- sum(model$y == 1)
    n0 <- sum(model$y == 0)
    coeffs["(Intercept)"] <- coeffs["(Intercept)"] - log(n1/n0)
  }
  names(coeffs) <- paste(prefix, names(coeffs), suffix, sep = "")
  if (!is.null(digits)) coeffs <- round(coeffs, digits)
  coeffs <- ifelse(coeffs < 0, coeffs, paste("+", coeffs, sep = ""))
  multips <- paste(coeffs, names(coeffs), sep = "*")
  multips <- sub(x = multips, pattern = paste(prefix, "*(Intercept)", suffix, sep = ""), replacement = "", fixed = TRUE)
  eqn <- apply(X = as.matrix(multips), MARGIN = 2, FUN = paste, collapse = "")
  if (substring(eqn, 1, 1) == "+")  eqn <- substring(eqn, 2)
  if (type == "y")  eqn <- paste("y=", eqn, sep = "")
  if (type == "P")  eqn <- paste("P=1-(1/(1+exp(", eqn, ")))", sep = "")
  if (type == "F")  eqn <- paste("F=1-(1/(1+exp(", eqn, ")))", sep = "")
  cat(eqn)
  return(invisible(eqn))
}  # end getModEqn function

[presented with Pretty R]

Usage example:

> mymodel <- glm(myspecies ~ var1 + var2 + var3, family = “binomial”)

> summary(mymodel)

> getModEqn(mymodel)

y=-8.90804 – 7.020668e-05*var1 + 0.0030487*var2 + 0.0363185*var3

> getModEqn(mymodel, type = “P”, digits = 3, suffix = “@mapset“)

P=1-(1/(1+exp(-8.908 + 0*var1@mapset + 0.003*var2@mapset + 0.036*var3@mapset)))

References

Barbosa A.M., Brown J.A. & Real R. (2014) modEvA – an R package for model evaluation and analysis. R package, version 0.1.

Barbosa, A.M., Real, R. & Vargas, J.M. (2010). Use of coarse-resolution models of species’ distributions to guide local conservation inferences. Conservation Biology 24: 1378–87

Real, R., Barbosa, A.M., Martínez-Solano, Í. & García-París, M. (2005). Distinguishing the distributions of two cryptic frogs (Anura: Discoglossidae) using molecular data and environmental modeling. Canadian Journal of Zoology 83: 536-545

Real, R., Barbosa, A.M. & Vargas, J.M. (2006). Obtaining environmental favourability functions from logistic regression. Environmental and Ecological Statistics 13: 237-245

Advertisements

Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s