The newest version of R package fuzzySim (3.0) is now on CRAN! It includes new functions such as ‘sharedFav‘, ‘favClass‘, ‘bioThreat‘ and ‘gridRecords‘; improvements to some functions, help files and examples; updated e-mail and citation information [ see citation(“fuzzySim”) ]; clarifications and typo corrections along the reference manual; and some bug fixes (after changes to base R and/or to function dependencies), e.g. to ‘getPreds‘ when applied to raster objects.
Among other new functionalities, fuzzySim now makes it easier to use variable selection and presence-absence modelling techniques on occurrence points + raster variables, as these are becoming the more common data formats in species distribution modelling (SDM) and ecological niche modelling (ENM). Here’s a worked example:
# download and plot predictor variables: # (make sure their spatial resolution is not finer than the occurrence data) library(raster) worldclim <- getData("worldclim", var = "bio", res = 10) plot(worldclim) plot(worldclim[]) # download and plot species occurrence data: library(rgbif) gbif <- occ_data(scientificName = "Galemys pyrenaicus", hasCoordinate = TRUE, limit = 5000) # GBIF data often have errors, data cleaning (e.g. with package 'scrubr') should be performed! # here we'll just remove records of absence or zero-abundance: absence_rows <- which(gbif$data$occurrenceStatus == "absent" | gbif$data$organismQuantity == 0) if (length(absence_rows) > 0) gbif$data <- gbif$data[-absence_rows, ] presences <- gbif$data[ , c("decimalLongitude", "decimalLatitude")] # plot the occurrence records on the map: points(presences) # crop variables to the extent of the occurrence data (mind that this is just a quick way, certainly not the best one, of delimiting the adequate region for analysis): worldclim_crop <- crop(worldclim, extent(range(presences$decimalLongitude), range(presences$decimalLatitude))) plot(worldclim_crop[]) points(presences)
# model occurrence data as presence-absence in the cropped grid of pixels: library(fuzzySim) library(terra) # first, get the centroid coordinates and variable values at pixels with and without presence records # (remember, raster resolution should not be finer than occurrence data!) gridded_presences <- gridRecords(rst = rast(worldclim_crop), pres.coords = presences) head(gridded_presences) plot(worldclim_crop[]) points(gridded_presences[gridded_presences$presence == 0, c("x", "y")], col = "red") points(gridded_presences[gridded_presences$presence == 1, c("x", "y")], col = "blue", pch = 20)
# then, build a GLM with variable selection on these presence-absence data: names(gridded_presences) model_GLM <- multGLM(data = gridded_presences, sp.cols = "presence", var.cols = 5:23, id.col = "cell", FDR = TRUE, corSelect = TRUE, step = TRUE, trim = TRUE) summary(model_GLM$models$presence) head(model_GLM$predictions) # finally, get and plot the model predictions (probability and favourability): pred_GLM_raster <- getPreds(data = stack(worldclim_crop), models = model_GLM$models) plot(pred_GLM_raster)
If you’re worried about using a presence-absence modelling algorithm on ‘presence-only’ records, you can compare these predictions with those of a widely used presence-background algorithm (Maxent) on the same data, to check that they are not far off:
library(maxnet) ?maxnet # build a maxent model with all variables: model_maxent_allvars <- maxnet(p = gridded_presences[ , "presence"], data = gridded_presences[ , 5:23], addsamplestobackground = TRUE, f = maxnet.formula(p = gridded_presences[ , "presence"], data = gridded_presences[ , 5:23], classes = "lq")) # "lq" = linear + quadratic features # build a maxent model with only the variables selected by multGLM (folowing FDR, correlation, AIC, significance): selected_vars <- model_GLM$variables$presence model_maxent_selvars <- maxnet(p = gridded_presences[ , "presence"], data = gridded_presences[ , selected_vars], addsamplestobackground = TRUE, f = maxnet.formula(p = gridded_presences[ , "presence"], data = gridded_presences[ , selected_vars], classes = "lq")) # compute and map maxent predictions: pred_maxent_allvars_raster <- raster::predict(worldclim_crop, model_maxent_allvars, type = "cloglog", clamp = TRUE) pred_maxent_selvars_raster <- raster::predict(worldclim_crop, model_maxent_selvars, type = "cloglog", clamp = TRUE) par(mfrow = c(1, 2)) plot(pred_maxent_allvars_raster, main = "Maxent all vars") plot(pred_maxent_selvars_raster, main = "Maxent selected vars")
Compare both Maxent prediction maps with the “presence_F” map above (favourability, which removes the effect of species prevalence from presence probability, so it’s the same as presence probability if prevalence were 50% — the default prevalence assumed by Maxent). This and other modelling methods will be taught next week at the CIBIO advanced course on ecological niche modelling. Check out the Courses page for other upcoming courses!
Pingback: fuzzySim updated to 3.0 on CRAN! | R-bloggers
Pingback: fuzzySim updated to 3.0 on CRAN! – Data Science Austria