fuzzySim updated to 3.0 on CRAN!


The newest version of R package fuzzySim (3.0) is now on CRAN! It includes new functions such as ‘sharedFav‘, favClass‘, ‘bioThreat‘ and ‘gridRecords; improvements to some functions, help files and examples; updated e-mail and citation information [ see citation(“fuzzySim”) ]; clarifications and typo corrections along the reference manual; and some bug fixes (after changes to base R and/or to function dependencies), e.g. to ‘getPreds‘ when applied to raster objects. You should now uninstall the old version of the package and install the new one:


Among other new functionalities, fuzzySim now makes it easier to use variable selection and presence-absence modelling techniques on occurrence points + raster variables, as these are becoming the more common data formats in species distribution modelling (SDM) and ecological niche modelling (ENM). Here’s a worked example:

# download and plot predictor variables:
# (make sure their spatial resolution is not finer than the occurrence data)
worldclim <- getData("worldclim", var = "bio", res = 10)

# download and plot species occurrence data:
gbif <- occ_data(scientificName = "Galemys pyrenaicus", hasCoordinate = TRUE, limit = 5000)
# GBIF data often have errors, data cleaning (e.g. with package 'scrubr') should be performed!
# here we'll just remove records of absence or zero-abundance:
absence_rows <- which(gbif$data$occurrenceStatus == "absent" | gbif$data$organismQuantity == 0)
if (length(absence_rows) > 0) gbif$data <- gbif$data[-absence_rows, ]
presences <- gbif$data[ , c("decimalLongitude", "decimalLatitude")]
# plot the occurrence records on the map:

# crop variables to the extent of the occurrence data (mind that this is just one possible way of cropping/masking to the adequate region of analysis):
worldclim_crop <- crop(worldclim, extent(range(presences$decimalLongitude), range(presences$decimalLatitude)))

# model occurrence data as presence-absence in the cropped grid of pixels:

# first, get the centroid coordinates and variable values at pixels with and without presence records
# (remember, raster resolution should not be finer than occurrence data!)
gridded_presences <- gridRecords(rst = worldclim_crop, pres.coords = presences)
points(gridded_presences[gridded_presences$presence == 0, c("x", "y")], col = "red")
points(gridded_presences[gridded_presences$presence == 1, c("x", "y")], col = "blue", pch = 20)
# then, build a GLM with variable selection on these presence-absence data:
model_GLM <- multGLM(data = gridded_presences, sp.cols = "presence", var.cols = 5:23, id.col = "cells", FDR = TRUE, corSelect = TRUE, step = TRUE, trim = TRUE)

# finally, get and plot the model predictions (probability and favourability):
pred_GLM_raster <- getPreds(data = stack(worldclim_crop), models = model_GLM$models)

If you’re worried about using a presence-absence modelling algorithm on ‘presence-only’ records, you can compare these predictions with those of a widely used presence-background algorithm (Maxent) on the same data, to check that they are not far off:


# build a maxent model with all variables:
model_maxent_allvars <- maxnet(p = gridded_presences[ , "presence"], data = gridded_presences[ , 5:23], f = maxnet.formula(p = gridded_presences[ , "presence"], data = gridded_presences[ , 5:23], classes = "lq"))  # linear + quadratic features

# build a maxent model with only the variables selected by multGLM (folowing FDR, correlation, AIC, significance):
selected_vars <- model_GLM$variables$presence
model_maxent_selvars <- maxnet(p = gridded_presences[ , "presence"], data = gridded_presences[ , selected_vars], f = maxnet.formula(p = gridded_presences[ , "presence"], data = gridded_presences[ , selected_vars], classes = "lq"))

# compute and map maxent predictions:
pred_maxent_allvars_raster <- raster::predict(worldclim_crop, model_maxent_allvars, type = "cloglog", clamp = TRUE)
pred_maxent_selvars_raster <- raster::predict(worldclim_crop, model_maxent_selvars, type = "cloglog", clamp = TRUE)
par(mfrow = c(1, 2))
plot(pred_maxent_allvars_raster, main = "Maxent all vars")
plot(pred_maxent_selvars_raster, main = "Maxent selected vars")

This and other modelling methods will be taught next week at the CIBIO advanced course on ecological niche modelling. Check out the Training page for other upcoming courses!

modTools: simple tools for simpler modelling


This website provides a series of free software tools, mostly written in R, that can be used for various tasks related to the analysis of species’ distribution and ecology (although many of them can be applied to other purposes). Most of them are now being included in R packages — please cite the appropriate package (mentioned in the blog post featuring each function) if you use these tools, which are released under a Creative Commons Attribution license. Please note that these tools are experimental (like R, they come with absolutely no warranty) and occasionally edited with corrections or improvements, so preferably check back for the latest version before you use a function.

Leave a comment here if you use these functions in publications — this way I can cite you in the blog, track how people are using the tools, and justify the time and effort dedicated to them. Thanks are due to Arpat Ozgul and François Guilhaumon, who helped me through the steep part of the R learning curve (to me it was actually more like an overhang). Comments and suggestions are very welcome, although a quick response cannot be guaranteed!

Functions for time tracking and management

Especially since I had to start working as a freelancer, it became essential to keep an accurate record of how much time I spend on each task and project. I checked out some software tools available online (namely TimeCamp, which I’ve used for a while), but I decided to see if I could write my own R functions for tracking time use, especially when working offline (some of us still like or need to work offline sometimes!), and for producing some related plots and analyses afterwards. Here’s what I came up with.

I first need to have a file named “PROJECTS.csv“, with a column named “project” and a column named “task“, and the names of the projects and tasks I’ll be working on, like this:


I can manually add projects and tasks to this file at any time. Then I use the ‘start_task‘ and ‘stop_task‘ functions below to add entries to another file, which is named “DONT-TOUCH_timeman_data.csv“, has five columns (proj, task, start, stop, time) and should not be moved or modified manually:

projects <- read.csv("PROJECTS.csv")

timeman_data <- read.csv("DONT-TOUCH_timeman_data.csv", as.is = TRUE)

start_task <- function(proj, task, time = NULL) {
  if (nrow(timeman_data) > 0 && is.na(timeman_data[nrow(timeman_data), ncol(timeman_data)]))
    stop ("previous task must be stopped before starting another")
  if (!(proj %in% unique(projects$project)))
    stop ("proj must exist in 'PROJECTS.csv'")
  if (!(task %in% unique(projects$task)))
    stop ("task must exist in 'PROJECTS.csv'")
  #if (!(proj %in% unique(timeman_data$projecto)))
    #timeman_data[nrow(timeman_data) + 1, "projecto"] <- proj
  if (is.null(time))
    time <- Sys.time()
  timeman_data[nrow(timeman_data) + 1, 1:3] <- c(proj, task, time)
  timeman_data <<- timeman_data

stop_task <- function(time = NULL) {
  if (nrow(timeman_data) > 0 && !is.na(timeman_data[nrow(timeman_data), ncol(timeman_data)]))
    stop ("task must be started before being stopped")
  if (is.null(time))
    time <- Sys.time()
  #this_row <- which.max(timeman_data$projecto == proj)
  this_row <- nrow(timeman_data)
  timeman_data[this_row, "stop"] <- as.character(time)
  timeman_data[this_row, "time"] <- difftime(timeman_data[this_row, "stop"], timeman_data[this_row, "start"])
  timeman_data <<- timeman_data
  write.csv(timeman_data, "DONT-TOUCH_timeman_data.csv", row.names = FALSE)
  cat("Showing last rows in tasks table:\n\n")

current_task <- function() {
  timeman_data[nrow(timeman_data), ]

I need to use ‘start_task‘ every time I start working on a task, and then ‘stop_task‘ every time I stop. This adds an entry with the starting, stopping, and total time spent in each instance:


start_task(proj = "timeman", task = "debugging")


I can then apply all sorts of R analyses and plots to the time-tracking data, which is recorded in the ‘timeman_data‘ data frame and CSV file. However, these functions are certainly far from ideal, as they use “unsafe” operators (like ‘<<-‘) and they rely on a file remaining untouched in its folder. It’s plain to see how things could easily go wrong at any moment…. Suggestions welcome on how this could be improved and made safer! While this was certainly an interesting exercise, I think I’ll just stick to TimeCamp for the time being 🙂

Plot outliers and their values

The ‘plot_outliers‘ function below draws a boxplot and a scatterplot of a numeric variable x and plots the values of the outliers (currently not offset, even if they overlap). For relatively small datasets, it can be a quick way to identify which outliers look reasonable and which are likely a result of transcription or measurement error, and thus should be either corrected or discarded.

plot_outliers <- function(x, val_col = "blue", ...) { 
  par_in <- par(no.readonly = TRUE) 
  par(mfrow = c(1, 2)) 
  bp <- boxplot(x, ...) 
  out <- bp$out 
  message(length(out), " outliers detected") 
  if (length(out) > 0) text(x = 0.5, y = bp$out, labels = round(out, 2), adj = 0, col = val_col)
  plot(x, pch = 20)
  if (length(out) > 0) text(x = 0.5, y = bp$out, labels = round(out, 2), adj = 0, col = val_col)

Usage examples:


Additional arguments for the ‘boxplot‘ function can be provided, e.g.

plot_outliers(airquality$Ozone, notch = TRUE)
plot_outliers(airquality$Wind, col = "darkgreen", main = "wind")

This function is used in an article which we hope to submit soon.

Grid point occurrence records onto a raster

The ‘gridRecords‘ function, which has just been added to the ‘fuzzySim‘ package (from version 2.6 on), takes a raster stack and a set of spatial coordinates of a species’ presence (and optionally absence) records, and returns a data frame with the presences and absences, as well as the corresponding values of the rasters in the grid of pixels (cells). If absence coordinates are not supplied, all pixels without any presence point will be returned as absences. A precursor of this function was used in Báez et al. (2020) for getting unique presences and absences from point occurrence data at the spatial resolution of marine raster variables. The function can be especially useful for using point + raster data to compute presence-absence models with packages or functions that normally require data frames as input, such as glm, multGLM, gam or randomForest.

gridRecords <- function(rst,
                        abs.coords = NULL,
                        na.rm = TRUE) {
  # version 2.0 (3 Feb 2020)
  if (!requireNamespace("raster")) stop("This function requires installing the 'raster' package first.")
  if (is.null(abs.coords)) {
    abs.coords <- raster::coordinates(rst)

  p_extract <- raster::extract(rst, pres.coords, cellnumbers = TRUE, df = TRUE)[ , -1]
  a_extract <- raster::extract(rst, abs.coords, cellnumbers = TRUE, df = TRUE)[ , -1]
  p_extract <- unique(p_extract)
  a_extract <- unique(a_extract)
  a_extract <- a_extract[!(a_extract$cells %in% p_extract$cells), ]
  p_centroids <- raster::xyFromCell(rst, p_extract$cells)
  a_centroids <- raster::xyFromCell(rst, a_extract$cells)
  p_extract <- data.frame(presence = 1, p_centroids, p_extract)
  if (nrow(a_extract) > 0) {
    a_extract <- data.frame(presence = 0, a_centroids, a_extract)
  result <- rbind(p_extract, a_extract)
  if (na.rm) {
    result_NA <- which(apply(result[ , 5:ncol(result)], MARGIN = 1, FUN = function(x) all(is.na(x))))
    if (length(result_NA) > 0) {
      result <- result[-result_NA, ]

Usage example:

library(fuzzySim)  # >= 2.6

# import a system raster with 3 layers and crop it to a smaller extent:
rst <- stack(system.file("external/rlogo.grd", package = "raster"))
ext <- extent(c(0, 15, 25, 40))
rst <- crop(rst, ext)

# generate some random presence and absence points:
presences <- sp::spsample(as(ext, "SpatialPolygons"), 50, type = "random")
absences <- sp::spsample(as(ext, "SpatialPolygons"), 50, type = "random")
points(presences, pch = 20, cex = 0.2, col = "black")
points(absences, pch = 20, cex = 0.2, col = "white")

# use 'gridRecords' on these random points:
gridded_pts <- gridRecords(rst, coordinates(presences), coordinates(absences))
head(gridded_pts)  # 'red', 'green' and 'blue' are the names of the layers in 'rst'

# plot them to check the result:
pres_coords <- gridded_pts[gridded_pts$presence == 1, c("x", "y")]
abs_coords <- gridded_pts[gridded_pts$presence == 0, c("x", "y")]
points(gridded_pts[ , c("x", "y")], pch = 4, cex = 0.6, col = gridded_pts$presence)
# you can also do it with only presence (no absence) records:
gridded_pres <- gridRecords(rst, coordinates(presences))
points(presences, pch = 20, cex = 0.2, col = "black")
pres_coords <- gridded_pres[gridded_pres$presence == 1, c("x", "y")]
abs_coords <- gridded_pres[gridded_pres$presence == 0, c("x", "y")]
points(gridded_pres[ , c("x", "y")], pch = 4, cex = 0.6, col = gridded_pres$presence)

Baez J.C., Barbosa A.M., Pascual P., Ramos M.L. & Abascal F. (2020) Ensemble modelling of the potential distribution of the whale shark in the Atlantic Ocean. Ecology and Evolution, 10: 175-184

modEvA 2.0 now on CRAN!

The new version of modEvA (2.0) is now on CRAN! It can produce some new plots, such as the precision-recall curve and the histograms/densities of predicted values for presence and absence observations. It calculates some additional evaluation measures, such as the area under the precision-recall curve (function ‘AUC‘), mean precision (also function ‘AUC‘) and the F1 score (function ‘threshMeasures‘). These measures are now also among the outputs of functions ‘multModEv‘, ‘optiThresh‘ and ‘optiPair‘. I’ve also fixed some bugs that were present in the previous version on CRAN, such as an annoying error when using function ‘varPart‘ with only two factors. You should now uninstall the old version of the package and install the new one:


Area under the precision-recall curve

The AUC function, in the modEvA package, initially computed only the area under the receiver operating characteristic (ROC) curve. Now, since modEvA version 1.7 (currently available on R-Forge), it also offers the option to compute the precision-recall curve, which may be better for comparing models based on imbalanced data (e.g. for rare species) — see e.g. Sofaer et al. (2019).

Usage example:

mod <- rotif.mods$models[["Ktropi"]]
par(mfrow = c(1, 2))
AUC(mod, main = "ROC curve")
AUC(mod, curve = "PR", main = "Precision-recall curve")


Sofaer, H.R., Hoeting, J.A. & Jarnevich, C.S. (2019). The area under the precision-recall curve as a performance metric for rare binary events. Methods in Ecology and Evolution, 10: 565-577

Plot the density of predicted values for presences and absences

The ‘predDensity’ function, included in the modEvA package since version 1.5 (currently available on R-Forge), produces a histogram and/or a kernel density plot of predicted values for observed presences and absences in a binomial GLM:

predDensity <- function (model = NULL, obs = NULL, pred = NULL, separate = TRUE, 
  type = c("both"), legend.pos = "topright") 
  if (!is.null(model)) {
    if (!("glm" %in% class(model)) || family(model)$family != 
      stop("'model' must be of class 'glm' and family 'binomial'.")
    if (!is.null(obs)) 
      message("Argument 'obs' ignored in favour of 'model'.")
    if (!is.null(pred)) 
      message("Argument 'pred' ignored in favour of 'model'.")
    obs <- model$y
    pred <- model$fitted.values
  if (is.null(obs)) {
    if (is.null(pred)) 
      stop("You must provide either 'model' or 'pred'.")
    separate <- FALSE
    obs <- sample(c(0, 1), length(pred), replace = TRUE)
  else {
    if (length(obs) != length(pred)) 
      stop("'obs' and 'pred' must have the same length.")
  pred0 <- pred[obs == 0]
  pred1 <- pred[obs == 1]
  type <- match.arg(type, c("histogram", "density", "both"))
  rslt <- vector("list")
  if (type %in% c("density", "both")) {
    if (!separate) {
      dens <- density(pred)
      xrange <- range(dens$x, finite = TRUE)
      yrange <- range(dens$y, finite = TRUE)
      rslt[["density"]] <- dens
    else {
      dens0 <- density(pred0)
      dens1 <- density(pred1)
      xrange <- range(dens0$x, dens1$x, finite = TRUE)
      yrange <- range(dens0$y, dens1$y, finite = TRUE)
      rslt[["density_obs1"]] <- dens1
      rslt[["density_obs0"]] <- dens0
    plot(x = xrange, y = yrange, xlab = "Predicted value", 
      ylab = "Density", type = "n")
  if (type %in% c("histogram", "both")) {
    hist0 <- hist(pred0, plot = FALSE)
    hist1 <- hist(pred1, plot = FALSE)
    if (type == "histogram") {
      yrange <- range(hist0$density, hist1$density, finite = TRUE)
      plot(x = c(0, 1), y = yrange, type = "n", xlab = "Predicted value", 
        ylab = "Density")
    if (!separate) {
      histogram <- hist(c(pred0, pred1), freq = FALSE, 
        col = "grey20", add = TRUE)
      rslt[["histogram"]] <- histogram
    else {
      hist(pred1, freq = FALSE, col = "grey20", add = TRUE)
      hist(pred0, freq = FALSE, col = "darkgrey", density = 40, 
        angle = 45, add = TRUE)
      rslt[["histogram_obs1"]] <- hist1
      rslt[["histogram_obs0"]] <- hist0
      if (legend.pos != "n" && type == "histogram") 
        legend(legend.pos, legend = c("absences", "presences"), 
          fill = c("darkgrey", "grey20"), border = NA, 
          density = c(40, NA), bty = "n")
  if (type %in% c("density", "both")) {
    if (!separate) {
      lines(dens, col = "black", lwd = 2)
    else {
      lines(dens1, col = "black", lwd = 2)
      lines(dens0, col = "darkgrey", lty = 5, lwd = 2)
      if (legend.pos != "n" && type == "density") 
        legend(legend.pos, legend = c("absences", "presences"), 
          col = c("darkgrey", "black"), lty = c(5, 1), 
          bty = "n")
      if (legend.pos != "n" && type == "both") 
        legend(legend.pos, legend = c("absences", "presences"), 
          fill = c("darkgrey", "grey20"), border = NA, 
          lty = c(5, 1), col = c("darkgrey", "grey15"), 
          density = c(40, NA), bty = "n")

Usage example:

install.packages("modEvA", repos="http://R-Forge.R-project.org")
predDensity(model = rotif.mods$models[[3]])

Variable selection with multGLM

The ‘multGLM‘ function in the ‘fuzzySim‘ R package automatically calculates generalized linear models for one or more species, with a range of options for variable selection. However, you may want to build your final models with other methods (e.g. multimodel-inference with packages ‘glmulti’ or ‘MuMIn’, or other modelling algorithms such as those implemented in packages ‘dismo’, ‘biomod2’ or ‘sdm’), but still previously apply some of the variable selection methods implemented in ‘multGLM’. Since ‘fuzzySim’ package version 2.2.3 (currently available on R-Forge), function ‘multGLM’ includes an additional item in the output: a list of character vectors naming the variables selected for each model, following the specified criteria. So, you can use ‘multGLM’ with your desired selection criteria and then use the selected variables elsewhere, as below:

library(fuzzySim) # >2.2.3

# make models for 2 of the species in rotif.env using some selection criteria:

mods_multGLM <- multGLM(rotif.env, sp.cols = 18:19, var.cols = 5:17, id.col = 1,
FDR = TRUE, corSelect = TRUE, cor.thresh = 0.8, step = FALSE, trim = FALSE)

# now use the selected variables elsewhere:

mod_glmulti_Abrigh <- glmulti(y = "Abrigh", xr = mods_multGLM$variables[["Abrigh"]], data = rotif.env, level = 1, method = "h", crit = "aic", fitfunction = "glm", family = binomial)

Also, the ‘multGLM‘ function (since fuzzySim version 2.2.3) now also accepts column names (not only index numbers) as the ‘sp.cols‘, ‘var.cols‘ and ‘id.col‘ arguments.

Plot shared favourability for two competing species

The sharedFav function below implements the graphical analyses of Acevedo et al. (2010, 2012) on biogeographical interactions. It takes two vectors of favourability values at different localities for, respectively, a stronger and a weaker competing species (or two equally strong competitors), and plots their favourableness or shared favourability to assess potential competitive interactions.

sharedFav <- function(strong_F, weak_F, conf = 0.95, main = "Shared favourability") {
stopifnot(length(strong_F) == length(weak_F))
opar <- par(no.readonly = T)
par(mar = c(4, 4, 2, 4.5))
F_intersection <- fuzzyOverlay(cbind(strong_F, weak_F), op = "intersection")
F_union <- fuzzyOverlay(cbind(strong_F, weak_F), op = "union")
Fovl <- sum(F_intersection, na.rm = TRUE) / sum(F_union, na.rm = TRUE)
brks <- seq(0, 1, by = 0.1)
bins <- 1:10
bin <- cut(F_intersection, breaks = brks, labels = bins)
strong_mean <- tapply(strong_F, INDEX = bin, FUN = mean)
weak_mean <- tapply(weak_F, INDEX = bin, FUN = mean)
strong_ci <- tapply(strong_F, INDEX = bin, FUN = function(x) t.test(x, conf.level = conf, na.action = na.pass)$conf.int)
weak_ci <- tapply(weak_F, INDEX = bin, FUN = function(x) t.test(x, conf.level = conf)$conf.int)
strong_ci[names(Filter(is.null, strong_ci))] <- NA
weak_ci[names(Filter(is.null, weak_ci))] <- NA
strong_ci_up <- unlist(lapply(strong_ci, `[`, 2))
strong_ci_dn <- unlist(lapply(strong_ci, `[`, 1))
weak_ci_up <- unlist(lapply(weak_ci, `[`, 2))
weak_ci_dn <- unlist(lapply(weak_ci, `[`, 1))
bin_size <- table(bin)
names(bin_size) <- names(bins)
props <- bin_size / length(bin)
bin <- as.integer(bin)
bar_plot <- barplot(rep(NA, length(bins)), ylim = c(0, 1), xlab = "Favourability intersection", ylab = "Mean favourability", names.arg = brks[-1], main = main)
col_bar <- "grey50"
col_ci <- "grey"
poly_left <- mean(bar_plot[2:3])
poly_right <- mean(bar_plot[8:9])
polygon(x = c(poly_left, poly_left, poly_right, poly_right), y = c(0, 1, 1, 0), col = "lightgrey", border = NA, density = 25, angle = -45)
par(new = TRUE)
barplot(props, col = col_bar, border = FALSE, xaxt = "n", yaxt = "n", add = TRUE)
axis(side = 4, col = col_bar, col.axis = col_bar, col.ticks = col_bar, col.lab = col_bar)
mtext(side = 4, line = 3, "Proportion of localities", col = col_bar)
abline(h = 0.8, col = "grey", lty = 3)
strong_x <- bar_plot - 0.1
weak_x <- bar_plot + 0.1
arrows(strong_x, strong_ci_dn, strong_x, strong_ci_up, code = 3, length = 0.03, angle = 90, col = col_ci)
arrows(weak_x, weak_ci_dn, weak_x, weak_ci_up, code = 3, length = 0.03, angle = 90, col = col_ci)
lines(x = strong_x, y = strong_mean, lwd = 2, lty = 1)
lines(x = weak_x, y = weak_mean, lwd = 2, lty = 2)
points(x = strong_x, y = strong_mean, pch = 20)
points(x = weak_x, y = weak_mean, pch = 15, cex = 0.8)

The function returns the fuzzy overlap index and a shared favourability plot similar to those of Acevedo et al. (2010, 2012). For more details, examples of use and additional references, see the help files of these functions in package fuzzySim >= 2.2, currently available on R-Forge. Special thanks to Pelayo Acevedo and Darío Chamorro for clarifications on how to build the plot!



Acevedo P., Ward A.I., Real R. & Smith G.C. (2010) Assessing biogeographical relationships of ecologically related species using favourability functions: a case study on British deer. Diversity and Distributions, 16: 515-528

Acevedo P., Jiménez-Valverde A., Melo-Ferreira J., Real R. & Alves, P.C. (2012) Parapatric species and the implications for climate change studies: a case study on hares in Europe. Global Change Biology, 18: 1509-1519

Richerson P.J. & Lum K. (1980) Patterns of plant species and diversity in California: relation to weather and topography. American Naturalist 116: 504-536