Model overlay operations based on fuzzy logic

Logical and set operations are useful for comparative distribution modelling, to assess consensus or mismatch between the predictions of different models, and to quantify differences between models obtained for different time periods. Fuzzy set theory (Zadeh 1965, Barbosa & Real 2012) allows performing such operations without converting the predictions from continuous to binary, with the inherent application of arbitrary thresholds and over-simplification of model predictions. The result is a continuous numerical value quantifying the intersection, union, sum, or other operation among model predictions, whether binary or continuous.

The fuzzyOverlay function, used e.g. by Gutiérrez-Rodríguez et al. (in press) and by Reino et al. (in press) and included in the fuzzySim package (>=1.6), requires a data frame (or a matrix) containing the model prediction columns to compare, an indication of which columns are to be compared (overlay.cols; by default they are all included), and an op indicating the operation to perform between them. Can be ‘consensus‘ for the arithmetic mean of predictions (or the fuzzy equivalent of the proportion of models that agree that the species occurs at each site), ‘fuzzy_and‘ or ‘intersection‘ for fuzzy intersection; ‘fuzzy_or‘ or ‘union‘ for fuzzy union; ‘prob_and‘ or ‘prob_or‘ for probabilistic and/or, respectively (see Details); ‘maintenance‘ for the values where all predictions for the same row (rounded to the number of digits specified in the next argument) are the same. If data has only two columns to compare, you can also calculate ‘xor‘ for exclusive or, ‘AnotB‘ for the the occurrence of the species in column 1 in detriment of that in column 2,’expansion‘ for the prediction increase in rows where column 2 has higher values than column 1, ‘contraction‘ for the prediction decrease in rows where column 2 has lower values than column 1, or ‘change‘ for a mix of the latter two, with positive values where there has been an increase and negative values where there was decrease in favourability from columns 1 to 2.

fuzzyOverlay <- function(data, 
                         overlay.cols = 1:ncol(data),
                         op = "intersection",
                         na.rm = FALSE,
                         round.digits = 2
                         ) {
 
  # version 1.2 (13 Nov 2015)
 
  data <- data[ , overlay.cols]
  stopifnot(all(data[!is.na(data), ] >= 0 && data[!is.na(data), ] <= 1))
 
  if (op == "consensus") rowSums(data, na.rm = na.rm) / ncol(data)
  else if (op %in% c("fuzzy_and", "intersection")) apply(data, MARGIN = 1, FUN = min, na.rm = na.rm)
  else if (op == "prob_and") apply(data, MARGIN = 1, FUN = prod, na.rm = na.rm)
  else if (op %in% c("fuzzy_or", "union")) apply(data, MARGIN = 1, FUN = max, na.rm = na.rm)
  else if (op == "prob_or") apply(data, MARGIN = 1, FUN = sum, na.rm = na.rm) - apply(data, MARGIN = 1, FUN = prod, na.rm = na.rm)
  else if (op == "maintenance") ifelse(round(data[ , 2], digits = round.digits) == round(data[ , 1], digits = round.digits), round(data[ , 1], digits = round.digits), 0)
 
  else if (op %in% c("xor", "AnotB", "expansion", "contraction", "change")) {
    if (ncol(data) != 2) stop ("This 'op' works only for 'data' with 2 columns.")
    if (op == "xor") pmax(pmin(data[,1], 1 - data[,2], na.rm = na.rm), pmin(1 - data[,1], data[,2], na.rm = na.rm), na.rm = na.rm)  # http://www.dmitry-kazakov.de/ada/fuzzy.htm#Fuzzy
    else if (op == "AnotB") pmin(data[,1], 1 - data[,2], na.rm = na.rm)
    else if (op == "expansion") ifelse(data[ , 2] > data[ , 1], 
                                       data[ , 2] - data[ , 1], 
                                       0)
    else if (op == "contraction") ifelse(data[ , 2] < data[ , 1], 
                                         data[ , 2] - data[ , 1], 
                                         0)
    else if (op == "change") data[ , 2] - data[ , 1]
  }
  else stop ("Invalid 'op' name.")
}

[presented with Pretty R]

You can install and load fuzzySim (>= 1.6) and then check help(fuzzyOverlay) for further  info and reproducible usage examples.

REFERENCES

Barbosa A.M. & Real R. (2012) Applying fuzzy logic to comparative distribution modelling: a case study with two sympatric amphibians. The Scientific World Journal, 2012, Article ID 428206

Gutiérrez-Rodríguez J., Barbosa A.M. & Martínez-Solano I. (in press) Present and past climatic effects on the current distribution and genetic diversity of the Iberian spadefoot toad (Pelobates cultripes): an integrative approach. Journal of Biogeography

Reino L., Ferreira M., Martínez-Solano I., Segurado P., Xu C. & Barbosa A.M. (in press) Favourable areas for co-occurrence of parapatric species: niche conservatism and niche divergence in Iberian tree frogs and midwife toads. Journal of Biogeography

Zadeh, L.A. (1965) Fuzzy sets. Information and Control, 8: 338-353

Advertisements

Miller calibration statistics

The MillerCalib function, now included in the modEvA package (version 0.7.2 just released), calculates Miller’s (1991) calibration statistics for a logistic regression model, namely the intercept and slope of the regression of the binary response variable on the logit of predicted probabilities. Optionally and by default, it also plots the corresponding regression line (black) over the reference diagonal (grey dashed line).

Calibration (or reliability) measures how a model’s predicted probabilities relate to observed species prevalence or proportion of presences in the modelled data. If predictions are perfectly calibrated, the slope will equal 1 and the intercept will equal 0, so the model’s calibation line will perfectly overlap with the reference diagonal. Note that Miller’s statistics assess the model globally: a model is well calibrated if the average of all predicted probabilities equals the proportion of presences in the modelled data. Therefore, good calibration is always attained on the same data used for building the model (Miller 1991). The function is more useful to see if a model stays well calibrated when extrapolated to a new dataset.

MillerCalib <- function(model = NULL, obs = NULL, pred = NULL, plot = TRUE, plot.values = TRUE, digits = 4, xlab = "", ylab = "", main = "Miller calibration", ...) {
  # version 1.3 (24 Jun 2015)
 
  model.provided <- ifelse(is.null(model), FALSE, TRUE)
 
  if (model.provided) {
    if (!is.null(pred)) message("Argument 'pred' ignored in favour of 'model'.")
    if (!is.null(obs)) message("Argument 'obs' ignored in favour of 'model'.")
    obs <- model$y
    pred <- model$fitted.values
  } else { # if model not provided
    if (is.null(obs) | is.null(pred)) stop("You must provide either 'obs' and 'pred', or a 'model' object of class 'glm'")
  }
 
  stopifnot(
    length(obs) == length(pred),
    obs %in% c(0,1),
    pred >= 0,
    pred <= 1
  )
 
  logit <- log(pred / (1 - pred))
  mod <- glm(obs ~ logit, family = binomial)
  intercept <- mod$coef[[1]]
  slope <- mod$coef[[2]]
  std.err <- summary(mod)$coefficients["logit", "Std. Error"]
  slope.p <- abs((slope - 1) / sqrt(std.err^2 + 0))  # Paternoster 98; http://stats.stackexchange.com/questions/55501/test-a-significant-difference-between-two-slope-values
 
  if (plot) {
    ymin <- min(0, intercept)
    ymax <- max(1, intercept + 0.1)
    plot(c(0, 1), c(ymin, ymax), type = "n", xlab = xlab, ylab = ylab, main = main)
    abline(0, 1, col = "lightgrey", lty = 2)
    abline(intercept, slope)
    if (plot.values) {
      plotext <- paste("intercept =" , round(intercept, digits), "\nslope =", round(slope, digits), "\nslope p-value =", round(slope.p, digits))
      text(x = 0, y = ymax - 0.25, adj = 0, labels = plotext)
    }
  }
  return(list(intercept = intercept, slope = slope, slope.pvalue = slope.p))
}  # end MillerCalib function

[presented with Pretty R]

Input data can be either a glm model object with binomial family and logit link, or two vectors of observed binary (0 or 1) values and the corresponding predicted probabilities. The output is a named list with the calibration intercept and slope.

REFERENCES

Miller M.E., Hui S.L. & Tierney W.M. (1991) Validation techniques for logistic regression models. Statistics in Medicine, 10: 1213-1226

Wintle B.A., Elith J. & Potts J.M. (2005) Fauna habitat modelling and mapping: A review and case study in the Lower Hunter Central Coast region of NSW. Austral Ecology, 30: 719-738

Distribution modelling based on presence points

The pres.point.mod function uses the dismo package to run a series of models (currently Bioclim, Domain and Maxent) for a set of presence point coordinates and a raster stack of predictor variables. Optionally, it will also apply these models to an extrapolation raster stack, and write prediction and extrapolation maps to disk. The pres.point.mod function is not included in a package, but it’s in the “Supplementary Info” of Zanolla et al. (in press), so please cite this paper if you use the function.

Note that Bioclim and Domain use only the information on the presence points, while Maxent uses also a set of background (pseudo-absence) points that are randomly generated if not provided by the user. Please read Hijmans & Elith’s “Species distribution modeling with R” for more info on these modelling methods and their R implementation.

pres.point.mod <- function(obs.xy, var.stack, extrap.stack = NULL, mod.methods = c("bioclim", "domain", "maxent"), bg.points = NULL, result.rasters.folder = NULL, result.rasters.format = "GTiff", overwrite = FALSE, ...) {
  # version 1.5 (8 Oct 2013)
  # obs.xy: a matrix or data frame containing 2 columns, the first with the x and the second with the y coordinates of the species presence points to be modelled
  # var.stack: a raster stack of the predictor variables to use in modelling, in the same coordinate reference system as obs.xy
  # extrap.stack: optionally, a raster stack of variables (with the same names and coordinate reference system as var.stack) to which to exrapolate the models
  # mod.methods: modelling methods to use; currently "bioclim", "domain" and "maxent" are implemented
  # bg.points: an optional matrix/dataframe of x-y coordinates of the background points or pseudo-absences to be used by maxent (the other 3 methods use only the presence points). If NULL, maxent will randomly generate 10000 points within the raster.stack
  # result.rasters.folder: optionally, the path to a folder (which will be created if not present) to save the rasters of model predictions to disk
  # result.rasters.format: format for the rasters to save in result.rasters.folder (type '?writeFormats' for options)
  # overwrite: logical, whether to overwrite raster files with the same name within the result.rasters.folder
  # ...: additional arguments for the 'maxent' function (see ?maxent details); relevant arguments may be 'a' for user-defined background points, 'path' for the folder in which to save maxent output files, and 'removeDuplicates' o specify whether duplicate presence points within the same pixel should be ignored

  start.time <- proc.time()
  if (!is.element("dismo", installed.packages()[,1])) stop("You need to have the 'dismo' package installed to run 'pres.point.mod'.")
  require(dismo)
  if(!is.null(result.rasters.folder)) dir.create(result.rasters.folder, showWarnings = FALSE)
  jar.path <- paste(system.file(package = "dismo"), "/java/", sep = "")
  if ("maxent" %in% mod.methods & !file.exists(paste(jar.path, "maxent.jar", sep = "")))  stop("'maxent.jar' file not found in '", jar.path, "; please put (a copy of) it there, or run 'pres.point.mod' without 'maxent' in 'mod.methods'", sep = "")
  results <- vector("list")

  if("bioclim" %in% mod.methods) {
    message("Building Bioclim model...  >>  ", Sys.time())
    bioc.model.expr <- expression(bioclim(x = var.stack, p = obs.xy))
    bioc.model <- tryCatch(eval(bioc.model.expr), error = function(e) NULL)
    if(!is.null(bioc.model)) {
      results[["bioc.model"]] <- bioc.model
      message("Making Bioclim prediction raster...  >>  ", Sys.time())
      bioc.rast <- predict(bioc.model, var.stack)
      results[["bioc.rast"]] <- bioc.rast
      if(!is.null(result.rasters.folder)) {
        message("Saving in result.rasters.folder...  >>  ", Sys.time())
        writeRaster(x = bioc.rast, filename = paste(result.rasters.folder, "Bioclim", sep = "/"), format = result.rasters.format, overwrite = overwrite)
      }  # end if result.rasters.folder
      if(!is.null(extrap.stack)) {
        message("Making Bioclim extrapolated raster...  >>  ", Sys.time())
        bioc.extrap.rast <- predict(bioc.model, extrap.stack)
        results[["bioc.extrap.rast"]] <- bioc.extrap.rast
        if(!is.null(result.rasters.folder)) {
          message("Saving in result.rasters.folder...  >>  ", Sys.time())
          writeRaster(x = bioc.extrap.rast, filename = paste(result.rasters.folder, "Bioclim_extrapolated", sep = "/"), format = result.rasters.format, overwrite = overwrite)
        }  # end if result.rasters.folder
      }  # end if extrap.stack
    }  # end if !null bioc model
    else message("...not! Bioclim model could not be obtained")
  }  # end if bioclim

  if("domain" %in% mod.methods) {
    message("Building Domain model...  >>  ", Sys.time())
    domain.model.expr <- expression(domain(x = var.stack, p = obs.xy))
    domain.model <- tryCatch(eval(domain.model.expr), error = function(e) NULL)
    if(!is.null(domain.model)) {
      results[["domain.model"]] <- domain.model
      message("Making Domain prediction raster...  >>  ", Sys.time())
      domain.rast <- predict(domain.model, var.stack)
      results[["domain.rast"]] <- domain.rast
      if(!is.null(result.rasters.folder)) {
        message("Saving in result.rasters.folder...  >>  ", Sys.time())
        writeRaster(x = domain.rast, filename = paste(result.rasters.folder, "Domain", sep = "/"), format = result.rasters.format, overwrite = overwrite)
      }  # end if result.rasters.folder
      if(!is.null(extrap.stack)) {
        message("Making Domain extrapolated raster...  >>  ", Sys.time())
        domain.extrap.rast <- predict(domain.model, extrap.stack)
        results[["domain.extrap.rast"]] <- domain.extrap.rast
        if(!is.null(result.rasters.folder)) {
          message("Saving in result.rasters.folder...  >>  ", Sys.time())
          writeRaster(x = domain.extrap.rast, filename = paste(result.rasters.folder, "Domain_extrapolated", sep = "/"), format = result.rasters.format, overwrite = overwrite)
        }  # end if result.rasters.folder
      }  # end if extrap.stack
    }  # end if !null domain model
    else message("...not! Domain model could not be obtained")
  }  # end if domain

  if("maxent" %in% mod.methods) {
    message("Building Maxent model...  >>  ", Sys.time())
    maxent.model.expr <- expression(maxent(x = var.stack, p = obs.xy, a = bg.points, ...))
    maxent.model <- tryCatch(eval(maxent.model.expr), error = function(e) NULL)
    if(!is.null(maxent.model)) {
      results[["maxent.model"]] <- maxent.model
      message("Making Maxent prediction raster...  >>  ", Sys.time())
      maxent.rast <- predict(maxent.model, var.stack)
      results[["maxent.rast"]] <- maxent.rast
      if(!is.null(result.rasters.folder)) {
        message("Saving in result.rasters.folder...  >>  ", Sys.time())
        writeRaster(x = maxent.rast, filename = paste(result.rasters.folder, "Maxent", sep = "/"), format = result.rasters.format, overwrite = overwrite)
      }  # end if result.rasters.folder
      if(!is.null(extrap.stack)) {
        message("Making Maxent extrapolated raster...  >>  ", Sys.time())
        maxent.extrap.rast <- predict(maxent.model, extrap.stack)
        results[["maxent.extrap.rast"]] <- maxent.extrap.rast
        if(!is.null(result.rasters.folder)) {
          message("Saving in result.rasters.folder...  >>  ", Sys.time())
          writeRaster(x = maxent.extrap.rast, filename = paste(result.rasters.folder, "Maxent_extrapolated", sep = "/"), format = result.rasters.format, overwrite = overwrite)
        }  # end if result.rasters.folder
      }  # end if extrap.stack
    }  # end if !null maxent model
    else message("...not! Maxent model could not be obtained")
  }  # end if maxent

  end.time <- proc.time()
  duration <- (end.time - start.time)[3]
  if (duration < 60) {
    units <- " second(s)"
  } else if (duration < 3600) {
    duration <- duration / 60
    units <- " minute(s)"
  } else {
    duration <- duration / 3600
    units <- " hour(s)"
  }

  message("Finished!  >>  ", Sys.time(), " -- it took ", round(duration),  units)
  return(results)
}  # end pres.point.mod function

[presented with Pretty R]

Please read the first lines of the function for an explanation of what each parameter means. Usage example:

mymodels <- pres.point.mod(obs.xy = mypoints, var.stack = predictor.stack, extrap.stack = var2050.stack, mod.methods = c(“bioclim”, “domain”, “maxent”), bg.points = bg.points, result.rasters.folder = “myspecies_models”, result.rasters.format = “GTiff”, overwrite = FALSE, path = “myspecies_maxent_out”)

 

pairs(mymodels$bioc.model)
response(mymodels$bioc.model)

response(mymodels$maxent.model)

 

plot(mymodels$bioc.rast)
plot(mymodels$domain.rast)
plot(mymodels$maxent.rast)

REFERENCES:

Zanolla M., Altamirano M., Carmona R., De La Rosa J., Souza-Egipsy V., Sherwood A., Tsiamis K., Barbosa A.M., Muñoz A.R. & Andreakis N. (in press) Assessing global range expansion in a cryptic species complex: insights from the red seaweed genus Asparagopsis (Florideophyceae). Journal of Phycology, accepted.

Get the equation of a model to apply elsewhere

The summary of a model in R gives you a table of the coefficient estimates and other parameters. Sometimes it may be useful to have a string of text with the model’s equation, so that you can present it in an article (e.g. Real et al. 2005) or apply it in a (raster map) calculation, either in R (although here you can usually use the ‘predict’ function) or in a GIS software (e.g. Barbosa et al. 2010). The getModEqn function, now included in the modEvA package (Barbosa et al. 2014), gets this equation for linear or generalized linear models.

By default it prints the “y” linear equation, but for generalized linear models you can also set type = “P” (for the equation of probability) or type = “F” (for favourability, which corrects the intercept to eliminate the effect of prevalence — see Real et al. 2006).

If the variables to which you want to apply the model have a prefix or suffix (e.g. prefix = “raster.stack$” for the R raster package, or prefix = “mydata$” for a data frame, or suffix = “@1” in Quantum GIS, or suffix = “@mapset” in GRASS), you can get these in the equation too, using the prefix and/or the suffix arguments.

getModEqn <- function(model, type = "y", digits = NULL, prefix = NULL, suffix = NULL) {
  # version 1.6 (14 Aug 2013)
  # model: a model object of class 'lm' or glm'
  # type: the type pf equation to present; can be "y" (for the linear logit equation), "P" (for predicted probability) or "F" (for favourability, which corrects the intercept to eliminate the effect of prevalence - Real et al. 2006)
  # digits: number of decimal places to which to round the model coefficients; if NULL, no rounding is done
  # prefix, suffix: text to add to the variables' names in the resulting equation (if your variables have a prefix or suffix added to the variable's name where you want to apply the equation)

  stopifnot(class(model) %in% c("lm", "glm"))
  if(length(type) != 1 | !(type %in% c("y", "P", "F"))) stop("'type' must be either 'y', 'P', or 'F'")
  if(!("glm" %in% class(model)) & type != "y") {
    message("types 'P' and 'F' are only applicable to models of class 'glm', so type was reset to 'y'")
    type <- "y"
  }
  coeffs <- summary(model)$coefficients[ , 1]
  if (type == "F" & !("(Intercept)") %in% names(coeffs)) {
    message("'F' is currently only applicable to models with an intercept, so type was set to 'P'")
    type <- "P"
  }
  if(type == "F") {
    n1 <- sum(model$y == 1)
    n0 <- sum(model$y == 0)
    coeffs["(Intercept)"] <- coeffs["(Intercept)"] - log(n1/n0)
  }
  names(coeffs) <- paste(prefix, names(coeffs), suffix, sep = "")
  if (!is.null(digits)) coeffs <- round(coeffs, digits)
  coeffs <- ifelse(coeffs < 0, coeffs, paste("+", coeffs, sep = ""))
  multips <- paste(coeffs, names(coeffs), sep = "*")
  multips <- sub(x = multips, pattern = paste(prefix, "*(Intercept)", suffix, sep = ""), replacement = "", fixed = TRUE)
  eqn <- apply(X = as.matrix(multips), MARGIN = 2, FUN = paste, collapse = "")
  if (substring(eqn, 1, 1) == "+")  eqn <- substring(eqn, 2)
  if (type == "y")  eqn <- paste("y=", eqn, sep = "")
  if (type == "P")  eqn <- paste("P=1-(1/(1+exp(", eqn, ")))", sep = "")
  if (type == "F")  eqn <- paste("F=1-(1/(1+exp(", eqn, ")))", sep = "")
  cat(eqn)
  return(invisible(eqn))
}  # end getModEqn function

[presented with Pretty R]

Usage example:

> mymodel <- glm(myspecies ~ var1 + var2 + var3, family = “binomial”)

> summary(mymodel)

> getModEqn(mymodel)

y=-8.90804 – 7.020668e-05*var1 + 0.0030487*var2 + 0.0363185*var3

> getModEqn(mymodel, type = “P”, digits = 3, suffix = “@mapset“)

P=1-(1/(1+exp(-8.908 + 0*var1@mapset + 0.003*var2@mapset + 0.036*var3@mapset)))

References

Barbosa A.M., Brown J.A. & Real R. (2014) modEvA – an R package for model evaluation and analysis. R package, version 0.1.

Barbosa, A.M., Real, R. & Vargas, J.M. (2010). Use of coarse-resolution models of species’ distributions to guide local conservation inferences. Conservation Biology 24: 1378–87

Real, R., Barbosa, A.M., Martínez-Solano, Í. & García-París, M. (2005). Distinguishing the distributions of two cryptic frogs (Anura: Discoglossidae) using molecular data and environmental modeling. Canadian Journal of Zoology 83: 536-545

Real, R., Barbosa, A.M. & Vargas, J.M. (2006). Obtaining environmental favourability functions from logistic regression. Environmental and Ecological Statistics 13: 237-245

Get the predictions of multiple models when applied to a new data set

The getPreds function, now inluded in the fuzzySim package (Barbosa, 2015), can be useful if you have a list of model objects (e.g. resulting from multGLM) and want to apply them to a new data set (e.g. with variables from another region or time period), with options to include the logit link (y) and/or Favourability, in just one step. If you want Favourability to be calculated, you need also the Fav function.

function (data, models, id.col = NULL, Y = FALSE, P = TRUE, Favourability = TRUE, 
    incl.input = FALSE) 
{
    start.time <- Sys.time()
    stopifnot(is.data.frame(data), is.list(models), is.null(id.col) | 
        id.col %in% (1:ncol(data)), is.logical(Y), is.logical(P), 
        is.logical(Favourability), is.logical(incl.input))
    if (!Y & !P & !Favourability) 
        stop("There are no predictions to get\nif all Y, P and Favourability are set to FALSE.")
    input.data <- data
    keeP <- P
    if (Favourability) 
        P <- TRUE
    n.nulls <- length(models[sapply(models, is.null)])
    if (n.nulls > 0) 
        warning(n.nulls, " model(s) were NULL and therefore\n          did not generate predictions")
    models <- models[!sapply(models, is.null)]
    n.models <- length(models)
    mod.count <- 0
    for (m in 1:n.models) {
        mod.count <- mod.count + 1
        mod.name <- names(models)[m]
        message("Predicting with model ", mod.count, " of ", 
            n.models, " (", mod.name, ")...")
        if (Y) {
            data[, ncol(data) + 1] <- predict(models[[mod.count]], 
                data)
            names(data)[ncol(data)] <- paste(mod.name, "Y", sep = "_")
        }
        if (P) {
            data[, ncol(data) + 1] <- predict(models[[mod.count]], 
                data, type = "response")
            names(data)[ncol(data)] <- paste(mod.name, "P", sep = "_")
        }
        if (Favourability) {
            data[, ncol(data) + 1] <- Fav(pred = data[, ncol(data)], 
                n1n0 = c(sum(models[[mod.count]]$y == 1), sum(models[[mod.count]]$y == 
                  0)))
            names(data)[ncol(data)] <- paste(mod.name, "F", sep = "_")
            if (!keeP) 
                data <- data[, -(ncol(data) - 1)]
        }
    }
    if (incl.input) {
        id.col <- NULL
    }
    else {
        data <- data[, -(1:ncol(input.data)), drop = FALSE]
        if (!is.null(id.col)) {
            data <- data.frame(input.data[, id.col, drop = FALSE], 
                data)
            names(data)[1] <- names(input.data)[id.col]
        }
    }
    message("Finished!")
    timer(start.time)
    return(data)
}

[presented with Pretty R]

REFERENCES:

Barbosa A.M. (2015) fuzzySim: applying fuzzy logic to binary similarity indices in ecology. Methods in Ecology and Evolution, 6: 853-858

Overlap Analysis

Overlap Analysis is one of the simplest forms of modelling species’ distributions. It assesses the ranges of values of the given predictor variables at the sites where a species has been recorded present, and predicts where that species should be able to occur based on those presence data (e.g. Brito et al. 1999, Arntzen & Teixeira 2006).

OA, now included in the modEvA package (Barbosa et al. 2014), can also be useful when extrapolating models outside their original scope (geographical area, time period or spatial resolution), as it can identify which localities are within the model’s domain — i.e., within the analysed ranges of values of the variables, outside which the model may not be reliable (e.g. Barbosa et al. 2009). In this case, the sp.cols should contain not species’ presence/absence, but rather indicate (with value 1) all observations that have been included in the model. Cf. modEvA function MESS, which is more informative but also more computationally intensive.

OA <-
function(data, sp.cols, var.cols) {
  # version 2.1 (13 May 2014)
 
  if (length(sp.cols) > 1) stop ("Sorry, OA is currently implemented for only one response variable at a time, so 'sp.cols' must indicate only one column")
 
  input.data <- data
 
  na.rm = TRUE
  if(na.rm) {
    mod.data <- data[ , c(sp.cols, var.cols)]
    data <- data[complete.cases(mod.data), ]
  }
 
  predictors <- data[ , var.cols]
  nvar <- ncol(predictors)
 
  response <- data[ , sp.cols]
  predictors.presence <- predictors[response > 0, ]
  var.presmin <- var.presmax <- vector("numeric", nvar)
  predictors.overlap <- matrix(data = NA, nrow = nrow(predictors), ncol = nvar)
  for (v in 1:nvar) {
    var.presmin[v] <- min(predictors.presence[ , v])
    var.presmax[v] <- max(predictors.presence[ , v])
    predictors.overlap[ , v] <- ifelse((predictors[ , v] >= var.presmin[v] & predictors[ , v] <= var.presmax[v]), 1, 0)
  }  # end for v
  overlap.noNA <- as.integer(rowSums(predictors.overlap) == nvar)
  return(overlap.noNA)
}

[presented with Pretty R]

NOTE that the input data format for this function has changed in mid May 2014, with former parameters response and predictors now replaced with data, sp.cols and var.cols (for coherence with multGLM and other functions in modEvA). Input data for the OA function are now a matrix or data frame containing the response variable (presences vs. absences of a species if we want to model its occurrence, or modelled vs. non-modelled sites if we want to know which non-modelled sites are within the modelled range), and the predictor variables to consider. The output is a binary vector whith 1 where the values of all predictors lie within the ranges observed for the presence records, and 0 otherwise. For example, if you have a table called mydata with your species’ presence data in the first column and your predictor variables in columns 2 to 12, paste in R the OA function followed by this command:

OA(data = mydata, sp.cols = 1, var.cols = 2:12)

References

Arntzen JW, Teixeira J. 2006. History and new developments in the mapping and modelling of the distribution of the golden-striped salamander, Chioglossa lusitanica. Zeitschrift für Feldherpetologie, Supplement: 1-14.

Barbosa A.M., Brown J.A. & Real R. (2014) modEvA – an R package for model evaluation and analysis. R package, version 0.1.

Barbosa, A.M., Real, R. & Vargas, J.M. (2009) Transferability of environmental favourability models in geographic space: the case of the Iberian desman (Galemys pyrenaicus) in Portugal and Spain. Ecological Modelling 220: 747–754.

Brito JC, Crespo EG, Paulo OS 1999. Modelling wildlife distributions: Logistic Multiple Regression vs Overlap Analysis. Ecography 22: 251-260.