modTools: simple tools for simpler modelling

Featured

This website provides a series of free software tools, mostly written in R, that can be used for various tasks related to the analysis of species’ distribution and ecology (although many of them can be applied to other purposes). Most of them are now being included in R packages — please cite the appropriate package (mentioned in the blog post featuring each function) if you use these tools, which are released under a Creative Commons Attribution license. Please note that these tools are experimental (like R, they come with absolutely no warranty) and occasionally edited with corrections or improvements, so preferably check back for the latest version before you use a function.

Leave a comment here if you use these functions in publications — this way I can cite you in the blog, track how people are using the tools, and justify the time and effort dedicated to them. Thanks are due to Arpat Ozgul and François Guilhaumon, who helped me through the steep part of the R learning curve (to me it was actually more like an overhang). Comments and suggestions are very welcome, although a quick response cannot be guaranteed!

Advertisements

Improvements to fuzzySim functions

A few improvements were recently made to several functions in the fuzzySim package. Mainly, function modelTrim is now more independent (it used to require “attach” sometimes); multTSA allows either “AIC” or “significance” as a backward stepwise selection criterion, and provides more descriptive polynomial term names (visible when save.models = TRUE); and multGLM now has the option to include a TSA (trend surface analysis, see ?multTSA) as a spatial variable calculated individually for each species. Some other minor changes are also implemented in version 1.9 of fuzzySim, which was just released on R-Forge. The next version, 2.0, will be submitted to CRAN, so all feedback is welcome to further improve the package now!

 

Change the background of your plotting window

Changing the background of your screen to dark instead of light colours can save energy and also spare your eyes. RStudio offers several dark-background themes, but the plotting window is still white by default. The function below allows you to define a different colour for the background and for the remaining elements of the plots:

plot_bg <- function(back = "black", front = "white") {
par(bg = back)
par(col = front, col.axis = front, col.lab = front, col.main = front, col.sub = front, fg = front)
}

To try it out, just copy the function above to R, press “enter”, and then run the following commands while watching the plotting window:

plot_bg(back = "darkblue", front = "turquoise")
plot(1:10)
plot_bg(back = "black", front = "white")
plot(1:10)

The default of this function is white plots on a black background so, if that’s what you want, you just need to run the function and this simpler command:

plot_bg()

All plots produced afterwards (at least those made with base R, or with no conflicting graphical parameters) should appear on a black window — for example:

plot(1:10)

plot1-10

Crop or mask rasters from a zip file

Imagine you have a zipped folder with global raster maps representing different variables (e.g. the ones you can download from WorldClim, CliMond, ENVIREM, CHELSA or ecoClimate), and you need to extract just a particular region or country that you’ll be working with. The cropZippedRasters function, provided below, automates this process without the need to previously unzip the entire folder (which can require substantial disk space). The function extracts one raster at a time from the given .zip file, imports the raster to R, crops or masks it with a user-provided raster or polygon vector map, and then (by default) deletes the unzipped raster file before unzipping the next one, therefore requiring minimal disk space.

 


cropZippedRasters <- function(zip.file, cut.map, msk = FALSE, rast.file.ext = ".tif", aux.file.ext = NULL, delete.unzipped = TRUE)
  # zip.file: path to the zip containing the raster maps to crop or mask
  # cut.map: raster or SpatialPolygons map (in your R workspace) defining the region to crop or mask
  # msk: logical, whether to actually mask with the cut.map borders (slower) or to just crop to the extent of cut.map (the default, which can be considerably faster)
  # rast.file.ext: file extension for the raster maps on disk (e.g. ".tif", ".bil", ".rst")
  # aux.file.ext: file extension for the auxiliary files when they exist (e.g. ".hdr" for .bil raster files, or ".rdc" for Idrisi .rst files)
{

  require(raster)
  rast.files <- unzip(zip.file, list = TRUE) $ Name
  var.names <- unique(tools::file_path_sans_ext(rast.files))
  n.var <- length(var.names)
  cut.rasts <- vector("list", length(var.names))
  names(cut.rasts) <- var.names

  for (i in 1:n.var) {
    message("Importing map ", i, " of ", n.var)
    message("  - unzipping file...")
    unzip(zip.file, files = paste0(var.names[i], rast.file.ext))
    if (!is.null(aux.file.ext)) unzip(zip.file, files = paste0(var.names[i], aux.file.ext))
    var.rast <- raster(paste0(var.names[i], rast.file.ext))

    message("  - cropping/masking map...")
    var.cut <- crop(var.rast, cut.map, snap = "out")
    if (msk)  var.cut <- mask(var.cut, cut.map)
    cut.rasts[[i]] <- var.cut

    if (delete.unzipped) {
      message("  - deleting unzipped file...")
      unlink(list.files()[grep(pattern = paste0(var.names[i], rast.file.ext), x = list.files())])
      if (!is.null(aux.file.ext)) unlink(list.files()[grep(pattern = paste0(var.names[i], aux.file.ext), x = list.files())])
    }  # end if delete

  }  # end for i
  message("Finished!")
  return(stack(cut.rasts))
}

Mind that you need to have the raster package installed for this, and a vector or raster map of the region with which you want to crop or mask the raster variables. This function is closely related to the zonalFromZip function posted a while ago; I may eventually get round to combining them into a single function.

Usage example:

wclim_we <- cropZippedRasters(zip.file = "wc2.0_30s_bio.zip", cut.map = westeurope, rast.file.ext = ".tif", aux.file.ext = NULL)
plot(wclim_we)

New features in fuzzySim functions

Just two quick notes on recent improvements to functions in the fuzzySim package:

– Thanks to a bug report by José Carlos Guerrero, the multTSA function now also works when there are not multiple but only a single species to analyse;

– Thanks to a feature request and a bit of assistance by Alba Estrada, the getPreds function now also works if the data argument is a RasterStack rather than a data frame.

Check out the new versions of these functions in the latest version of fuzzySim!

Find pairs of highly correlated variables

Correlations among variables are problematic in multivariate models, and good practice commonly recommends eliminating a priori one from each pair of highly correlated variables. The corPairs function below takes a matrix of correlations among variables, plots a cluster dendrogram based on them and returns the pairs of variables with absolute correlations above a given threshold (by default, 0.8):

corPairs <- function(cor.mat, cor.thresh = 0.8, ...) {
  # A.M. Barbosa, version 1.0 (22 Feb 2018)
  dist_matrix <- as.dist(1 - abs(cor.mat))
  plot(hclust(dist_matrix, ...), ylab = "1 - absolute correlation", xlab = "", sub = "")
  abline(h = 1 - cor.thresh, col = "red")
  cor.mat[upper.tri(cor.mat, diag = TRUE)] <- NA
  high.cor.mat <- numeric(0)
  if (max(abs(cor.mat), na.rm = TRUE) >= cor.thresh) {
    high.cor.rowcol <- as.matrix(which(abs(cor.mat) >= cor.thresh, arr.ind = TRUE))
    high.cor.inds <- sort(unique(as.vector(high.cor.rowcol)))
    high.cor.mat <- data.frame(var1 = rownames(cor.mat)[high.cor.rowcol[ , "row"]], var2 = colnames(cor.mat)[high.cor.rowcol[ , "col"]])
    high.cor.mat$corr <- NULL
    for (r in 1:nrow(high.cor.mat)) 
      high.cor.mat$corr[r] <- cor.mat[high.cor.rowcol[ ,"row"][r], high.cor.rowcol[ ,"col"][r]]
    high.cor.mat <- high.cor.mat[order(abs(high.cor.mat$corr), decreasing = TRUE), ]
    return (high.cor.mat)
  } # end if max
  else return("No correlations exceed the threshold.")
} # end corPairs function

This function is closely related to (and will soon be included in) the corSelect function in the fuzzySim package (Barbosa, 2015), but operates on correlation matrices rather than the raw variables, and adds a cluster dendrogram.

 

REFERENCES

Barbosa A.M. (2015) fuzzySim: applying fuzzy logic to binary similarity indices in ecology. Methods in Ecology and Evolution, 6: 853–858

Potential biodiversity and relative favourability

The two functions below allow calculating potential biodiversity and relative favourability, respectively, according to the procedures described by Real et al. (2017) — which should be cited if you use any of them — and also used by Estrada et al. (2018):

potentialBiodiv <- function(models = NULL, # a list of binomial glm model objects
data = NULL, # alternatively, data frame with observations and predictions
sp.cols = NULL, # index numbers of the columns with presence-absence data
prob.cols = NULL, # index numbers of the columns with probabilities
fav.cols = NULL, # alternatively, index numbers of columns with favourabilities
id.col = NULL, # optionally, index number of a column with row identifiers
zero.action = "increase" # what to do with zero favourability values when calculating the geometric mean; alternatives are "exclude" or "maintain"
) {
if (!(zero.action %in% c("increase", "exclude", "maintain"))) stop ("Invalid 'zero.action'.")
if (!is.null(models)) {
if (!all("glm" %in% sapply(models, class))) stop("All 'models' must be of class 'glm'.")
if (!all("binomial" %in% sapply(models, family))) stop("All 'models' must be of family 'binomial'.")
obs.data <- as.data.frame(lapply(models, `[[`, "y"))
prob.data <- as.data.frame(lapply(models, `[[`, "fitted.values"))
} else {
if (is.null(data) || is.null(sp.cols) || (is.null(prob.cols) && is.null(fav.cols))) stop ("You need to provide either 'models', or 'data' + 'sp.cols' + either 'prob.cols' or 'fav.cols'.")
obs.data <- data[ , sp.cols, drop = FALSE]
if (!is.null(prob.cols)) prob.data <- data[ , prob.cols, drop = FALSE]
} # end if !null models else
Nobs <- nrow(obs.data)
Nsp <- ncol(obs.data)
if (is.null(fav.cols)) {
favs <- matrix(NA, ncol = Nsp, nrow = Nobs)
for (s in 1:Nsp) {
#favs[ , s] <- fuzzySim::Fav(obs = obs.data[ , s], pred = prob.data[ , s])
n1 <- sum(obs.data[ , s] == 1, na.rm = TRUE)
n0 <- sum(obs.data[ , s] == 0, na.rm = TRUE)
prob <- prob.data[ , s]
favs[ , s] <- (prob/(1 - prob))/((n1/n0) + (prob/(1 - prob)))
}; rm(s)
} else { # if !null fav.cols
favs <- data[ , fav.cols, drop = FALSE]
} # end if null fav else
if (zero.action == "increase") favs[favs == 0] <- 2.2e-16
#.Machine$double.xmin # smallest computable positive number
else if (zero.action == "exclude") favs[favs == 0] <- NA
F_lns <- log(favs, base = exp(1))
F_ln_sums <- rowSums(F_lns) # eliminated 'na.rm = na.rm'
result <- data.frame(SpRich = rowSums(obs.data),
Fmin = apply(favs, 1, min),
Fmax = apply(favs, 1, max),
Fsum = rowSums(favs),
Fmean = rowMeans(favs),
Fgmean = exp((1 / Nsp) * F_ln_sums))
if (!is.null(data) && !is.null(id.col))
result <- data.frame(data[ , id.col, drop = FALSE], result)
result
}

 

relativeFav <- function(target, # matrix or data frame of favourablities in target period
ref) { # idem for reference period, with columns in the same order
stopifnot(ncol(target) == ncol(ref))
N <- ncol(target)
fav_ratios <- target / ref
logs <- log(fav_ratios, base = exp(1))
log_sums <- rowSums(logs)
exp((1 / N) * log_sums)
}

 

REFERENCES

Estrada A., Barbosa A.M., & Real R. (2018) Changes in potential mammal diversity in National Parks and their implications for conservation. Current Zoology, https://doi.org/10.1093/cz/zoy001

Real R., Barbosa A.M., & Bull J.W. (2017) Species Distributions, Quantum Theory, and the Enhancement of Biodiversity Measures. Systematic Biology, 66: 453-462