Favourability: a GLM for prevalence-independent modelling of binary responses

Logistic regression (Generalised Linear Model with binomial error distribution and a logit link) is widely used for modelling species’ potential distributions using presence/absence data and a set of categorical or continuous predictor variables. However, this GLM incorporates the prevalence (relative proportion of presences and absences) of the species in the training sample, which affects the probability values produced.

Barbosa (2006) and Real, Barbosa & Vargas (2006) proposed an environmental favourability function which is based on logistic regression but cancels out uneven proportions of presences and absences in the modelled data. Favourability thus assesses the extent to which the environmental conditions change the probability of occurrence of a species with respect to its overall prevalence in the study area. Model predictions can, therefore, be directly compared between species with different prevalences. The favourability function has been implemented in SAM software and is now also included in the fuzzySim R package (Barbosa, 2015).

Using simulated data, Albert & Thuiller (2008) proposed a modification to the favourability function, but it requires knowing the true prevalence of the species (not just the prevalence in the studied sample), which is rarely possible in real-world modelling. Besides, this suggestion was based on the misunderstanding that the favourability function was a way to obtain the probability of occurrence when prevalence differs from 50%, which is incorrect (see Acevedo & Real 2012).

To get environmental favourability in R with either the Real, Barbosa & Vargas (“RBV”) or the Albert & Thuiller (“AT”) method, you just need to get a probabilistic model (e.g. logistic regression) from your data and then use the Fav function given below. Input data for this function are either a model object resulting from the glm function, or the presences-absences (1-0) of your species and the corresponding presence probability values, obtained e.g. with predict(mymodel, mydata, type = “response”). Alternatively to the presences-absences, you can provide the sample prevalence or the numbers of presences and absences. In case you want to use the “AT” method, you also need to provide the true (absolute) prevalence of your species.

Fav <- function(obs = NULL, pred = NULL, n1n0 = NULL, model = NULL, sample.preval = NULL, method = "RBV", true.preval = NULL) {
  # version 1.2 (25 Mar 2014)
  # obs: a vector of 1-0 values of a modelled binary variable
  # pred: a vector of predicted probability values given e.g. by logistic regression (note: non-probabilistic predicted values are not suitable for the Fav function!)
  # n1n0: alternatively to obs, the total number of ones and zeros, in this order -- e.g. n1n0 = c(214, 317); ignored if the 'obs' vector is provided
  # sample.preval: alternatively to obs or n1n0, the prevalence (proportion of positive cases) of the modelled binary variable in the modelled data
  # model: alternatively to all of the above, a model object of class "glm" (if provided, will override any values provided in the arguments described above)
  # method: either "RBV" for the original Real, Barbosa & Vargas (2006) procedure, or AT for the modification proposed by Albert & Thuiller (2008) (but see Acevedo & Real 2012, Naturwissenschaften 99:515-522)
  # true.preval: the true prevalence (as opposed to sample prevalence); necessary if you want to use the AT method
 
  if (!is.null(model)) {
    if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.")
    if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.")
    if (!is.null(n1n0)) message("Argument 'n1n0' ignored in favour of 'model'.")
    if (!is.null(sample.preval)) message("Argument 'sample.preval' ignored in favour of 'model'.")
    obs <- model$y
    pred <- model$fitted.values
  }  # end if model
 
  if(!is.null(obs) & !is.null(pred) & length(obs) != length(pred)) {
    stop("'obs' and 'pred' must have the same length (and be in the same order).")
  }
 
  if (!is.null(obs)) {
    n1 <- sum(obs == 1, na.rm = TRUE)
    n0 <- sum(obs == 0, na.rm = TRUE)
  }  # end if obs
 
  else if (!is.null(n1n0)) {
    if(!is.null(model)) message("Argument 'n1n0' ignored in favour of 'model'")
    else if (!is.null(obs)) message("Argument 'n1n0' ignored in favour of 'obs'")
    n1 <- n1n0[1]
    n0 <- n1n0[2]
    #if(n1 + n0 != length(pred)) stop("n1+n0 must equal the length of 'pred'.")
  }  # end if n1n0
 
  else if (!is.null(sample.preval)) {
    if(!is.null(model)) message("Argument 'sample.preval' ignored in favour of 'model'")
    else if (!is.null(obs)) message("Argument 'sample.preval' ignored in favour of 'obs'")
    else if (!is.null(n1n0)) message("Argument 'sample.preval' ignored in favour of 'n1n0'")
    n1 <- sample.preval * 100
    n0 <- 100 - n1
  }  # end if sample.preval
 
  else stop("You need to provide either 'model', or 'obs' plus either one of 'pred', 'n1n0' or 'sample.preval'.")
 
  if(method == "RBV") {  # Real, Barbosa & Vargas 2006
    fav <- (pred / (1 - pred)) / ((n1 / n0) + (pred / (1 - pred)))
  }
 
  else if(method == "AT") {  # Albert & Thuiller 2008; but see Acevedo & Real 2012
    sample.preval <- n1 / (n1 + n0)
    fav <- (pred / (1 - pred)) / ((sample.preval / true.preval) + (pred / (1 - pred)))
  }
 
  else stop("method must be either 'RBV' or 'AT'")
 
  return(fav)
} # end Fav function

[presented with Pretty R]

Usage examples:

Fav(model = mymodel)

Fav(obs = mydata$myspecies, pred = mydata$myspecies.P)

Fav(pred = mydata$myspecies.P, sample.preval = 0.42)

Fav(pred = mydata$myspecies.P, n1n0 = c(824, 1430))

References:

Acevedo P. & Real R. (2012) Favourability: concept, distinctive characteristics and potential usefulness. Naturwissenschaften 99: 515-522

Albert C.H. & Thuiller W. (2008) Favourability functions versus probability of presence: advantages and misusesEcography 31: 417-422.

Barbosa, A.M.E. (2006) Modelación de relaciones biogeográficas entre predadores, presas y parásitos: implicaciones para la conservación de mamíferos en la Península Ibérica. PhD Thesis, University of Málaga (Spain). [PDF available here]

Barbosa A.M. (2015) fuzzySim: applying fuzzy logic to binary similarity indices in ecology. Methods in Ecology and Evolution, 6: 853-858

Real R., Barbosa A.M. & Vargas J.M. (2006) Obtaining environmental favourability functions from logistic regressionEnvironmental and Ecological Statistics 13: 237-245.

Advertisements

Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s