modTools: simple tools for simpler modelling


This website provides a series of free software tools, mostly written in R, that can be used for various tasks related to the analysis of species’ distribution and ecology (although many of them can be applied to other purposes). Most of them are now being included in R packages — please cite the appropriate package (mentioned in the blog post featuring each function) if you use these tools, which are released under a Creative Commons Attribution license. Please note that these tools are experimental (like R, they come with absolutely no warranty) and occasionally edited with corrections or improvements, so preferably check back for the latest version before you use a function.

Leave a comment here if you use these functions in publications — this way I can cite you in the blog, track how people are using the tools, and justify the time and effort dedicated to them. Thanks are due to Arpat Ozgul and François Guilhaumon, who helped me through the steep part of the R learning curve (to me it was actually more like an overhang). Comments and suggestions are very welcome, although a quick response cannot be guaranteed!


Find pairs of highly correlated variables

Correlations among variables are problematic in multivariate models, and good practice commonly recommends eliminating a priori one from each pair of highly correlated variables. The corPairs function below takes a matrix of correlations among variables, plots a cluster dendrogram based on them and returns the pairs of variables with absolute correlations above a given threshold (by default, 0.8):

corPairs <- function(cor.mat, cor.thresh = 0.8, ...) {
  # A.M. Barbosa, version 1.0 (22 Feb 2018)
  dist_matrix <- as.dist(1 - abs(cor.mat))
  plot(hclust(dist_matrix, ...), ylab = "1 - absolute correlation", xlab = "", sub = "")
  abline(h = 1 - cor.thresh, col = "red")
  cor.mat[upper.tri(cor.mat, diag = TRUE)] <- NA
  high.cor.mat <- numeric(0)
  if (max(abs(cor.mat), na.rm = TRUE) >= cor.thresh) {
    high.cor.rowcol <- as.matrix(which(abs(cor.mat) >= cor.thresh, arr.ind = TRUE))
    high.cor.inds <- sort(unique(as.vector(high.cor.rowcol)))
    high.cor.mat <- data.frame(var1 = rownames(cor.mat)[high.cor.rowcol[ , "row"]], var2 = colnames(cor.mat)[high.cor.rowcol[ , "col"]])
    high.cor.mat$corr <- NULL
    for (r in 1:nrow(high.cor.mat)) 
      high.cor.mat$corr[r] <- cor.mat[high.cor.rowcol[ ,"row"][r], high.cor.rowcol[ ,"col"][r]]
    high.cor.mat <- high.cor.mat[order(abs(high.cor.mat$corr), decreasing = TRUE), ]
    return (high.cor.mat)
  } # end if max
  else return("No correlations exceed the threshold.")
} # end corPairs function

This function is closely related to (and will soon be included in) the corSelect function in the fuzzySim package (Barbosa, 2015), but operates on correlation matrices rather than the raw variables, and adds a cluster dendrogram.



Barbosa A.M. (2015) fuzzySim: applying fuzzy logic to binary similarity indices in ecology. Methods in Ecology and Evolution, 6: 853–858

Potential biodiversity and relative favourability

The two functions below allow calculating potential biodiversity and relative favourability, respectively, according to the procedures described by Real et al. (2017) — which should be cited if you use any of them — and also used by Estrada et al. (2018):

potentialBiodiv <- function(models = NULL, # a list of binomial glm model objects
data = NULL, # alternatively, data frame with observations and predictions
sp.cols = NULL, # index numbers of the columns with presence-absence data
prob.cols = NULL, # index numbers of the columns with probabilities
fav.cols = NULL, # alternatively, index numbers of columns with favourabilities
id.col = NULL, # optionally, index number of a column with row identifiers
zero.action = "increase" # what to do with zero favourability values when calculating the geometric mean; alternatives are "exclude" or "maintain"
) {
if (!(zero.action %in% c("increase", "exclude", "maintain"))) stop ("Invalid 'zero.action'.")
if (!is.null(models)) {
if (!all("glm" %in% sapply(models, class))) stop("All 'models' must be of class 'glm'.")
if (!all("binomial" %in% sapply(models, family))) stop("All 'models' must be of family 'binomial'.") <-, `[[`, "y")) <-, `[[`, "fitted.values"))
} else {
if (is.null(data) || is.null(sp.cols) || (is.null(prob.cols) && is.null(fav.cols))) stop ("You need to provide either 'models', or 'data' + 'sp.cols' + either 'prob.cols' or 'fav.cols'.") <- data[ , sp.cols, drop = FALSE]
if (!is.null(prob.cols)) <- data[ , prob.cols, drop = FALSE]
} # end if !null models else
Nobs <- nrow(
Nsp <- ncol(
if (is.null(fav.cols)) {
favs <- matrix(NA, ncol = Nsp, nrow = Nobs)
for (s in 1:Nsp) {
#favs[ , s] <- fuzzySim::Fav(obs =[ , s], pred =[ , s])
n1 <- sum([ , s] == 1, na.rm = TRUE)
n0 <- sum([ , s] == 0, na.rm = TRUE)
prob <-[ , s]
favs[ , s] <- (prob/(1 - prob))/((n1/n0) + (prob/(1 - prob)))
}; rm(s)
} else { # if !null fav.cols
favs <- data[ , fav.cols, drop = FALSE]
} # end if null fav else
if (zero.action == "increase") favs[favs == 0] <- 2.2e-16
#.Machine$double.xmin # smallest computable positive number
else if (zero.action == "exclude") favs[favs == 0] <- NA
F_lns <- log(favs, base = exp(1))
F_ln_sums <- rowSums(F_lns) # eliminated 'na.rm = na.rm'
result <- data.frame(SpRich = rowSums(,
Fmin = apply(favs, 1, min),
Fmax = apply(favs, 1, max),
Fsum = rowSums(favs),
Fmean = rowMeans(favs),
Fgmean = exp((1 / Nsp) * F_ln_sums))
if (!is.null(data) && !is.null(id.col))
result <- data.frame(data[ , id.col, drop = FALSE], result)


relativeFav <- function(target, # matrix or data frame of favourablities in target period
ref) { # idem for reference period, with columns in the same order
stopifnot(ncol(target) == ncol(ref))
N <- ncol(target)
fav_ratios <- target / ref
logs <- log(fav_ratios, base = exp(1))
log_sums <- rowSums(logs)
exp((1 / N) * log_sums)



Estrada A., Barbosa A.M., & Real R. (2018) Changes in potential mammal diversity in National Parks and their implications for conservation. Current Zoology,

Real R., Barbosa A.M., & Bull J.W. (2017) Species Distributions, Quantum Theory, and the Enhancement of Biodiversity Measures. Systematic Biology, 66: 453-462

Trend surface analysis for multiple species

Trend surface analysis is a way to model the spatial structure in species distributions by regressing occurrence data on the spatial coordinates x and y, for a linear trend, and/or on polynomial terms of these coordinates (including the sums of powers and cross-products of the coordinates) for curvilinear trends (Legendre & Legendre, 1998; Borcard et al., 2011). Second- and third-degree polynomials are often used. The multTSA function below, which is included in the fuzzySim package, allows specifying the degree of the spatial polynomial to use. By default, it uses a 3rd-degree polynomial, performs backward stepwise AIC selection of the polynomial terms to include, and returns the presence probability value resulting from the spatial trend, although these options can be changed — for example, if you want the result to be a new spatial variable in the same scale of the input coordinates, you can set P = FALSE. This function can be used for one or multiple species at a time, in a similar way as the multGLM function in the same package.

multTSA <- function (data, sp.cols, coord.cols, id.col = NULL, degree = 3, 
  step = TRUE, direction = "backward", P = TRUE, Favourability = FALSE, suffix = "_TS", save.models = FALSE) 
  start.time <- Sys.time()   stopifnot(na.omit(as.matrix(data[, sp.cols])) %in% c(0,      1), length(sp.cols) > 0, length(sp.cols) <= ncol(data) - 
    length(coord.cols) - length(id.col), sp.cols %in% 1:ncol(data) | 
    sp.cols %in% colnames(data), is.null(coord.cols) | length(coord.cols) == 
    2, is.null(coord.cols) | coord.cols %in% 1:ncol(data) | 
    coord.cols %in% colnames(data), is.numeric(as.matrix(data[, 
    coord.cols])), is.null(id.col) | id.col %in% 1:ncol(data) | 
    id.col %in% colnames(data), degree%%1 == 0, is.logical(step), is.logical(P),
    is.logical(Favourability), is.logical(save.models))
  coords.poly <-[, coord.cols]), 
    degree = degree, raw = TRUE))
  n.poly.terms <- ncol(coords.poly)
  colnames(coords.poly) <- paste0(rep("poly", n.poly.terms), 
    c(1:n.poly.terms)) <- as.matrix(data[, sp.cols])
  colnames( <- colnames(data[, sp.cols])
  n.subjects <- length(sp.cols)
  if (save.models) 
    TSA.models <- vector("list", n.subjects)
  subj.count <- 0
  data.input <- data
  data <- cbind(data, coords.poly)
  for (s in 1:n.subjects) {
    subj.count <- subj.count + 1 <- colnames([s]
    message("Computing TSA ", subj.count, " of ", n.subjects, 
      " (",, ") ...")
    model.formula <- as.formula(paste(, "~", paste(colnames(coords.poly), 
      collapse = "+")))
    model.expr <- expression(with(data, glm(model.formula, 
      family = binomial, data = data)))
    if (step) 
      model <- step(eval(model.expr), direction = direction, trace = 0)
    else model <- eval(model.expr)
    if (P || Favourability)  pred <- predict(model, coords.poly, type = "response")
    else  pred <- predict(model, coords.poly)
    if (Favourability) {
      pred <- Fav(obs =[, s], pred = pred)
    data[, ncol(data) + 1] <- pred
    colnames(data)[ncol(data)] <- paste0(, suffix)
    if (save.models) {
      TSA.models[[subj.count]] <- model
      names(TSA.models)[[subj.count]] <-
  predictions <- data.frame(data[, id.col], data[, ((ncol(data.input) + 
    1 + n.poly.terms):ncol(data))])
  if (!is.null(id.col)) {
    if (is.character(id.col)) 
      colnames(predictions)[1] <- id.col
    else colnames(predictions)[1] <- colnames(data)[id.col]
  if (save.models) 
    return(list(predictions = data.frame(predictions), TSA.models = TSA.models))
  else return(predictions)



Borcard D., Gillet F. & Legendre P. (2011) Numerical Ecology with R. Springer, New York.

Legendre P. & Legendre L. (1998) Numerical Ecology. Elsevier, Amsterdam.

Recent updates to previous functions

Just a quick note to let you know of changes introduced to some functions in recent versions of their packages:

  • multGLM, FDR and corSelect (package fuzzySim) now also include BIC (Bayesian Information Criterion, also known as Schwarz criterion, SBC or SBIC) for variable selection, making it more parsimonious especially for large sample sizes
  • multGLM (package fuzzySim) now allows choosing which FDR correction to apply (if FDR is set to TRUE). It uses “fdr(=”BH”) by default, but check p.adjust for other available methods, namely “BY” which allows for non-independent data
  • varPart (package modEvA) now does not ask for model.type, includes in the output diagram the proportion of unexplained variance, and allows defining a title (main) for the diagram
  • several modEvA functions now automatically omit NAs, and emit a warning instead of an error if there are pred values outside the [0,1] interval (which some models include in the output)

The help files of these functions have also been updated with further details. It is advisable to unsinstall fuzzySim and modEvA and install their latest versions.

Range change based on continuous (fuzzy) values

This function, included in the fuzzySim package (version 1.6 just released), uses the fuzzyOverlay function in the previous post and then quantifies overall range change (including gain, loss, maintenance/stability and total change) based on the continuous predictions of two species distribution models (see Gutiérrez-Rodríguez et al., in press).

fuzzyRangeChange <- function(pred1, pred2, number = TRUE, prop = TRUE, na.rm = TRUE, round.digits = 2, measures = c("Gain", "Loss", "Stable_presence", "Stable_absence", "Balance"), plot = TRUE, col = colorRampPalette(c("white", "black"))(length(measures)), ...) {
  # version 1.4 (23 Mar 2016)
  stopifnot(ncol(pred1) == ncol(pred2),
            all(pred1[is.finite(pred1)] >= 0 && pred1[is.finite(pred1)] <= 1),
            all(pred2[is.finite(pred2)] >= 0 && pred2[is.finite(pred2)] <= 1)
  if (!number & !prop) stop ("Nothing to calculate if both 'number' and 'prop' are FALSE.")
  values <- vector("numeric", length(measures))
  names(values) <- measures
  if ("Gain" %in% measures)  values["Gain"] <- sum(fuzzyOverlay(data.frame(pred1, pred2), op = "expansion", na.rm = na.rm), na.rm = na.rm)
  if ("Loss" %in% measures)  values["Loss"] <- sum(fuzzyOverlay(data.frame(pred1, pred2), op = "contraction", na.rm = na.rm), na.rm = na.rm)
  if ("Stable_presence" %in% measures)  values["Stable_presence"] <- sum(fuzzyOverlay(data.frame(pred1, pred2), op = "maintenance", na.rm = na.rm), na.rm = na.rm)
  if ("Stable_absence" %in% measures)  values["Stable_absence"] <- sum(fuzzyOverlay(1 - data.frame(pred1, pred2), op = "maintenance", na.rm = na.rm), na.rm = na.rm)
  if ("Balance" %in% measures)  values["Balance"] <- sum(fuzzyOverlay(data.frame(pred1, pred2), op = "change", na.rm = na.rm), na.rm = na.rm)
  result <- data.frame(Measure = measures, Number = values)
  if (prop) {
    if (na.rm) n <- length(na.omit(pred1))
    else n <- length(pred1)
    range.size <- sum(pred1, na.rm = na.rm)
    stable.abs <- result[result$Measure == "Stable_absence", "Number"]
    result$Proportion <- result[ , "Number"] / range.size
    result[result$Measure == "Stable_absence", "Proportion"] <- stable.abs / (n - range.size)
  if (!number) {
    result <- result[ , - (ncol(result) - 1)]
  if (plot) {
    barplot(result[ , ncol(result)], legend.text = rownames(result), col = col, ...)
    abline(h = 0)
  result[ , "Measure"] <- NULL
  if (is.finite(round.digits))  result <- round(result, round.digits)

[presented with Pretty R]

The function will produce numeric and (since FuzzySim 1.7.2) also graphical results quantifying the fuzzy overall changes:


You can install and load fuzzySim (>= 1.6) and then check help(fuzzyRangeChange) for further  info and reproducible usage examples.



Gutiérrez-Rodríguez J., Barbosa A.M. & Martínez-Solano I. (in press) Present and past climatic effects on the current distribution and genetic diversity of the Iberian spadefoot toad (Pelobates cultripes): an integrative approach. Journal of Biogeography.

Model overlay operations based on fuzzy logic

Logical and set operations are useful for comparative distribution modelling, to assess consensus or mismatch between the predictions of different models, and to quantify differences between models obtained for different time periods. Fuzzy set theory (Zadeh 1965, Barbosa & Real 2012) allows performing such operations without converting the predictions from continuous to binary, with the inherent application of arbitrary thresholds and over-simplification of model predictions. The result is a continuous numerical value quantifying the intersection, union, sum, or other operation among model predictions, whether binary or continuous.

The fuzzyOverlay function, used e.g. by Gutiérrez-Rodríguez et al. (in press) and by Reino et al. (in press) and included in the fuzzySim package (>=1.6), requires a data frame (or a matrix) containing the model prediction columns to compare, an indication of which columns are to be compared (overlay.cols; by default they are all included), and an op indicating the operation to perform between them. Can be ‘consensus‘ for the arithmetic mean of predictions (or the fuzzy equivalent of the proportion of models that agree that the species occurs at each site), ‘fuzzy_and‘ or ‘intersection‘ for fuzzy intersection; ‘fuzzy_or‘ or ‘union‘ for fuzzy union; ‘prob_and‘ or ‘prob_or‘ for probabilistic and/or, respectively (see Details); ‘maintenance‘ for the values where all predictions for the same row (rounded to the number of digits specified in the next argument) are the same. If data has only two columns to compare, you can also calculate ‘xor‘ for exclusive or, ‘AnotB‘ for the the occurrence of the species in column 1 in detriment of that in column 2,’expansion‘ for the prediction increase in rows where column 2 has higher values than column 1, ‘contraction‘ for the prediction decrease in rows where column 2 has lower values than column 1, or ‘change‘ for a mix of the latter two, with positive values where there has been an increase and negative values where there was decrease in favourability from columns 1 to 2.

fuzzyOverlay <- function(data, 
                         overlay.cols = 1:ncol(data),
                         op = "intersection",
                         na.rm = FALSE,
                         round.digits = 2
                         ) {
  # version 1.2 (13 Nov 2015)
  data <- data[ , overlay.cols]
  stopifnot(all(data[!, ] >= 0 && data[!, ] <= 1))
  if (op == "consensus") rowSums(data, na.rm = na.rm) / ncol(data)
  else if (op %in% c("fuzzy_and", "intersection")) apply(data, MARGIN = 1, FUN = min, na.rm = na.rm)
  else if (op == "prob_and") apply(data, MARGIN = 1, FUN = prod, na.rm = na.rm)
  else if (op %in% c("fuzzy_or", "union")) apply(data, MARGIN = 1, FUN = max, na.rm = na.rm)
  else if (op == "prob_or") apply(data, MARGIN = 1, FUN = sum, na.rm = na.rm) - apply(data, MARGIN = 1, FUN = prod, na.rm = na.rm)
  else if (op == "maintenance") ifelse(round(data[ , 2], digits = round.digits) == round(data[ , 1], digits = round.digits), round(data[ , 1], digits = round.digits), 0)
  else if (op %in% c("xor", "AnotB", "expansion", "contraction", "change")) {
    if (ncol(data) != 2) stop ("This 'op' works only for 'data' with 2 columns.")
    if (op == "xor") pmax(pmin(data[,1], 1 - data[,2], na.rm = na.rm), pmin(1 - data[,1], data[,2], na.rm = na.rm), na.rm = na.rm)  #
    else if (op == "AnotB") pmin(data[,1], 1 - data[,2], na.rm = na.rm)
    else if (op == "expansion") ifelse(data[ , 2] > data[ , 1], 
                                       data[ , 2] - data[ , 1], 
    else if (op == "contraction") ifelse(data[ , 2] < data[ , 1], 
                                         data[ , 2] - data[ , 1], 
    else if (op == "change") data[ , 2] - data[ , 1]
  else stop ("Invalid 'op' name.")

[presented with Pretty R]

You can install and load fuzzySim (>= 1.6) and then check help(fuzzyOverlay) for further  info and reproducible usage examples.


Barbosa A.M. & Real R. (2012) Applying fuzzy logic to comparative distribution modelling: a case study with two sympatric amphibians. The Scientific World Journal, 2012, Article ID 428206

Gutiérrez-Rodríguez J., Barbosa A.M. & Martínez-Solano I. (in press) Present and past climatic effects on the current distribution and genetic diversity of the Iberian spadefoot toad (Pelobates cultripes): an integrative approach. Journal of Biogeography

Reino L., Ferreira M., Martínez-Solano I., Segurado P., Xu C. & Barbosa A.M. (in press) Favourable areas for co-occurrence of parapatric species: niche conservatism and niche divergence in Iberian tree frogs and midwife toads. Journal of Biogeography

Zadeh, L.A. (1965) Fuzzy sets. Information and Control, 8: 338-353

Assess model overlap with Schoener’s D, Hellinger distance and Warren’s I

Metrics for quantifying the similarity among ecological niche models are important for testing patterns of niche evolution. The modOverlap function, used e.g. by Reino et al. (in press) and now included in the fuzzySim package (>=1.5), calculates three such metrics: Shoener’s D statistic for niche overlap, Hellinger distance between probability distributions, and the I similarity statistic of Warren et al. (2008).

These formulas are also implemented e.g. within the niche.overlap function of R package phyloclim, but there they require input data in complex and software-specific formats. The function below calculates these indices from simply two vectors or columns containing the predictions of the two models to compare.

modOverlap <- function (pred1, pred2, na.rm = TRUE) 
    p1 <- pred1/sum(pred1, na.rm = na.rm)
    p2 <- pred2/sum(pred2, na.rm = na.rm)
    SchoenerD <- 1 - 0.5 * sum(abs(p1 - p2), na.rm = na.rm)
    HellingerDist <- sqrt(sum((sqrt(p1) - sqrt(p2))^2, na.rm = na.rm))
    WarrenI <- 1 - ((HellingerDist^2)/2)
    list(SchoenerD = SchoenerD, WarrenI = WarrenI, HellingerDist = HellingerDist)

[presented with Pretty R]

See also the fuzSim function in the fuzzySim package, which calculates fuzzy versions of classic binary similarity indices such as Jaccard, Baroni-Urbani & Buser, Sorensen and Simpson (Barbosa 2015).



Barbosa A.M. (2015) fuzzySim: applying fuzzy logic to binary similarity indices in ecology. Methods in Ecology and Evolution, 6: 853-858

Reino L., Ferreira M., Martínez-Solano I., Segurado P., Xu C. & Barbosa A.M. (in press) Favourable areas for co-occurrence of parapatric species: niche conservatism and niche divergence in Iberian tree frogs and midwife toads. Journal of Biogeography

Warren D.L., Glor R.E. & Turelli, M. (2008) Environmental niche equivalency versus conservatism: quantitative approaches to niche evolution. Evolution, 62: 2868-83 [plus ERRATUM]