Plot column(s) of a polygon vector map

NOTE: I recently had a tutorial on the cartography R package, which makes mapping columns of a data frame much less troublesome. You may want to look at that instead.

If you have a polygon vector map in R and want to quickly map the values in one or more columns of its attribute table, you can use the plotMapColumns function. There’s an option to rasterize the map before plotting, which may be considerably faster and is TRUE by default, but you’ll need to use an appropriate raster.upscale value. This is the number by which the range of coordinates should be divided to get the number of pixels for the maps to be plotted; it’s advised to first check your range(coordinates(map)) and see for yourself which raster.upscale divisor will make reasonably computable raster maps – e.g., for geographical lat-lon, an upscale factor of 1 will usually work (you’ll have at most 360 x 180 pixels; actually you may want to lower raster.upscale to 0.5 or 0.3 if you need more detailed maps); but for a UTM projection (whose coordinates can be much larger values) you may need an upscale.factor of 10000 to get a reasonably computable number of pixels.

plotMapColumns <- function(map, # SpatialPolygons object
                           columns, # index(es) of column(s) of map@data containing the values to plot (there will be one output map per column)
                           rasterize = TRUE, # number by which the difference between maximum and minimum coordinates should be divided to get the number of pixels (if rasterize = TRUE); it's advised to first calculate min and max coordinates and see for yourself which divisor will make reasonably computable raster maps (e.g., for geographical lat-lon an upscale factor of 1 may work, but for a UTM projection you may need an upscale of factor 10,000!)
                           raster.upscale = 1, 
                           ...) # additional arguments for (sp)plot function
  {

  stopifnot(raster.upscale > 0 | is.null(raster.upscale),
            require(raster) | rasterize == FALSE
            )
 
  if (!all(columns %in% 1:length(names(map)))) stop ("index out of bounds; 'columns' must exist in map@data.")
 
  if (rasterize) {
    xmin <- min(coordinates(map)[,1])
    xmax <- max(coordinates(map)[,1])
    ymin <- min(coordinates(map)[,2])
    ymax <- max(coordinates(map)[,2])
    wdth <- round(xmax - xmin)
    hght <- round(ymax - ymin)
 
    #if (raster.upscale == "auto") {
      #max.length <- max(wdth, hght)
      #if (max.length > 500) raster.upscale <-
    #}
 
    #if (!is.null(max.rast.dim)) {
    #  rast.dim <- wdth * hght
    #}
 
    wdth <- wdth / raster.upscale
    hght <- hght / raster.upscale
    message("plotting map(s) with ", wdth, "x", hght, " pixels; consider rising 'raster.upscale' if this is taking too long, or lowering it if the resulting maps are too coarse.")
    require(raster)
    rast <- raster(nrows = hght, ncols = wdth, xmn = xmin, xmx = xmax, ymn = ymin, ymx = ymax)
  }  # end if rasterize I
 
  #if (centroids) {
  #  attr.table <- map@data
  #  map <- SpatialPointsDataFrame(coordinates(map))
  #  map@data <- attr.table
  #  rast <- raster(map)
  #} else {
    require(sp)
  #}
 
  n.cols <- length(columns)
  col.count <- 0
  for (i in columns) {
    col.count <- col.count + 1
    message("Plotting column ", col.count, " of ", n.cols, "...")
    if (rasterize) {
      map.rast <- rasterize(x = map, y = rast, field = names(map)[i], fun = 'last')
      plot(map.rast, main = names(map)[i], ...)
    }  # end if rasterize II
    else {
      print(spplot(map, zcol = names(map)[i], main = names(map)[i], ...))
    }  # end else
  }  # end for i
  message("Finished!")
}

[presented with Pretty R]

Usage example:

# download, unzip and import a map of countries:
download.file("http://biogeo.ucdavis.edu/data/world/countries_shp.zip", destfile = "countries_shp.zip")
unzip("countries_shp.zip")
countries <- rgdal::readOGR(dsn = ".", layer= "countries")
 
# see the data in the attributes table:
head(countries)
names(countries)
 
# use plotMapColumns with and without rasterizing:
plotMapColumns(countries, columns = 17:18, rasterize = TRUE, raster.upscale = 1)
plotMapColumns(countries, columns = 18, rasterize = FALSE)  # slower

You can add arguments for the (sp)plot function, to get e.g. different colour schemes. The plotMapColumns function is not (yet) included in a package.

Advertisements

Calculate zonal statistics from rasters in multiple zip files

This is a wrapper for the zonalFromZip function published in the previous post, for when you have multiple zip files with multiple raster files each (as in the WorldClim paleo-climate database), and you want to extract zonal statistics for them all automatically. To use it, you’ll need to have the zonalFromZip function loaded, as well as the raster R package.

zonalsFromZips <- function(zip.files, zones.rast, rast.file.ext = ".tif", aux.file.ext = NULL, verbosity = 1, ...) {
  results <- vector("list", length(zip.files))
  names(results) <- basename(tools::file_path_sans_ext(zip.files))
  for (f in 1:length(zip.files)) {
    message("\nUNZIPPING FILE ", f, " OF ", length(zip.files), " (", basename(zip.files[f]), ")...")
    results[[f]] <- zonalFromZip(zip.file = zip.files[f], zones.rast = zones.rast, rast.file.ext = rast.file.ext, aux.file.ext = aux.file.ext, verbosity = verbosity, ...)
  }; rm(f)
  message("\nFINISHED ALL!")
  return(results)
}  # end zonalsFromZips function

 

The result is a list of dataframes, each containing the zonal stats for one of the .zip files of rasters. Usage example:

LGM.zonals <- zonalsFromZips(zip.files = list.files("/home/joe/LGM", full.names = TRUE), zones.rast = utm10.raster)

[presented with Pretty R]

 

Calculate zonal statistics from rasters in a zip file

Imagine you have a zipped folder with a bunch of raster maps containing variables (e.g. the ones you can download from WorldClim or from CliMond), and you need to calculate zonal statistics from each of these rasters. The zonaFromZip function, provided below, automates this process without the need to unzip the folders. It extracts one raster at a time from the .zip, imports it to R, calculates zonal statistics for your zones raster map (the ‘mean’ function is used by default, but you can provide any other argument accepted by the zonal function of the R raster package), and then deletes the unzipped file before unzipping the next one, therefore requiring minimal disk space.

zonalFromZip <- function (zip.file, zones.rast, rast.file.ext = ".tif", aux.file.ext = NULL, delete.unzipped = TRUE, verbosity = 2, ...)
  # zip.file: path to the zip containing the raster maps to calculate zonal stats from
  # zones.rast: raster map (in your R workspace) containing the spatial units to calculate zonal stats to
  # rast.file.ext: file extension for the raster maps on disk
  # aux.file.ext: file extension for the auxiliary files (e.g. ".hdr" for .bil raster files, or ".rdc" for Idrisi .rst files)
  # ...: arguments to pass to the 'zonal' function from the 'raster' package
{
  require(raster)
  rast.files <- unzip(zip.file, list = TRUE) $ Name
  var.names <- unique(tools::file_path_sans_ext(rast.files))
  n.var <- length(var.names)
  zonal.stats <- vector("list", length(var.names))
  names(zonal.stats) <- var.names
  for (i in 1:n.var) {
    if (verbosity >= 1) message("Getting variable ", i, " of ", n.var)
    if (verbosity >= 2) message("  - unzipping file...")
    unzip(zip.file, files = paste0(var.names[i], rast.file.ext))
    if (!is.null(aux.file.ext)) unzip(zip.file, files = paste0(var.names[i], aux.file.ext))
    var.rast <- raster(paste0(var.names[i], rast.file.ext))
    if (!compareRaster(var.rast, zones.rast, stopiffalse = FALSE)) {
      if (verbosity >= 2) message("  - cropping to zones raster...")
      var.rast <- crop(var.rast, zones.rast)
    }
    if (verbosity >= 2) message("  - calculating zonal stats...")
    zonal.stats[[i]] <- raster::zonal(var.rast, zones.rast, ...)
    if (verbosity >= 2) message("  - deleting unzipped file...")
    if (delete.unzipped) {
      unlink(list.files()[grep(pattern = paste0(var.names[i], rast.file.ext), x = list.files())])
      if (!is.null(aux.file.ext)) unlink(list.files()[grep(pattern = paste0(var.names[i], aux.file.ext), x = list.files())])
    }
  }
  if (verbosity >= 1) message("Converting results to data frame...")
  zonal.stats <- as.data.frame(zonal.stats)
  zonal.stats <- subset(zonal.stats, select = c(1, seq(2, ncol(zonal.stats), 2)))
  colnames(zonal.stats)[1] <- "zone"
  colnames(zonal.stats)[-1] <- var.names
  if (verbosity >= 1) message("Finished!")
  return(zonal.stats)
}

 

Mind that you need the raster R package installed for this, and a raster map of the spatial units (zones) to which you want to extract the raster variables.Usage examples:

LGM.CCSM4.utm10 <- zonalFromZip(zip.file = "LGM/CCSM4.zip", zones.rast = utm10.rst, rast.file.ext = ".tif", aux.file.ext = NULL)
head(LGM.CCSM4.utm10)
 
WClim.utm10 <- zonalFromZip(zip.file = "bio_2-5m_bil.zip", zones.rast = utm10.rst, rast.file.ext = ".bil", aux.file.ext = ".hdr")
head(WClim.utm10)

Example for several .zip files within a folder at once (DEPRECATED – see next post’s zonalsFromZips function instead):

for (f in list.files("LGM")) {  # "LGM" is the folder containing the zip files to extract zonal stats from
  name <- paste("LGM", tools::file_path_sans_ext(f), "utm10", sep = ".")
  zonstat <- zonalFromZip(zip.file = paste("LGM", f, sep = "/"), zones.rast = u10.rst, raster.ext = ".tif", fun = "mean")
  assign(name, zonstat)
}

[presented with Pretty R]

 

Download several zip files automatically

I recently found out that a bunch of new past climate scenarios were made available on WorldClim, at least for past climate. While there may be more efficient ways to do this, here’s the R function I wrote to download several of them automatically based on the URLs (link locations) of the .zip files:

downloadZips <- function(zip.urls, zip.names = NULL, dir.name = NULL, unzip = FALSE) {
# zip.names: names to give the downloaded zip files (if different from the original ones)
# dir.name: name of the directory folder in which to store the downloaded files
# unzip: logical, whether or not to unzip the files after downloading
  stopifnot(is.null(zip.names) | length(zip.names) == length(zip.urls))
  if (!is.null(dir.name)) {
    dir.create(dir.name)
    setwd(dir.name)
    on.exit(setwd("../"))
  }
  if (is.null(zip.names))  zip.names <- tools::file_path_sans_ext(basename(zip.urls))
  for (z in 1:length(zip.urls)) {
    message("\nDownloading zip ", z, " of ", length(zip.urls), " (", zip.names[z], ")...\n")
    zip.file <- paste(zip.names[z], "zip", sep = ".")
    download.file(zip.urls[z], zip.file)
    if (unzip) {
      dir.create(zip.names[z])
      message("Unzipping file to folder '", zip.names[z], "'...")
      unzip(zip.file, exdir = zip.names[z])
    }  # end if unzip
  }  # end for z
  message("Finished!")
}  # end downloadZips function

Usage examples (mind that these large files take quite a while to download):

# LGM:
downloadZips(zip.urls = c("http://biogeo.ucdavis.edu/data/climate/cmip5/lgm/cclgmbi_2-5m.zip", "http://biogeo.ucdavis.edu/data/climate/cmip5/lgm/mrlgmbi_2-5m.zip", "http://biogeo.ucdavis.edu/data/climate/cmip5/lgm/melgmbi_2-5m.zip"), zip.names = c("CCSM4", "MIROC_ESM", "MPI_ESM_P"), dir.name = "LGM")
 
# Mid Holocene:
downloadZips(zip.urls = c("http://biogeo.ucdavis.edu/data/climate/cmip5/mid/bcmidbi_2-5m.zip", "http://biogeo.ucdavis.edu/data/climate/cmip5/mid/ccmidbi_2-5m.zip", "http://biogeo.ucdavis.edu/data/climate/cmip5/mid/cnmidbi_2-5m.zip", "http://biogeo.ucdavis.edu/data/climate/cmip5/mid/hgmidbi_2-5m.zip", "http://biogeo.ucdavis.edu/data/climate/cmip5/mid/hemidbi_2-5m.zip", "http://biogeo.ucdavis.edu/data/climate/cmip5/mid/ipmidbi_2-5m.zip", "http://biogeo.ucdavis.edu/data/climate/cmip5/mid/mrmidbi_2-5m.zip", "http://biogeo.ucdavis.edu/data/climate/cmip5/mid/memidbi_2-5m.zip", "http://biogeo.ucdavis.edu/data/climate/cmip5/mid/mgmidbi_2-5m.zip"), zip.names = c("BCC_CSM1_1", "CCSM4", "CNRM_CM5", "HadGEM2_CC", "HadGEM2_ES", "IPSL_CM5A_LR", "MIROC_ESM", "MPI_ESM_P", "MRI_CGCM3"), dir.name = "MidHol")
 
# Last Inter-Glacial:
downloadZips(zip.urls = "http://biogeo.ucdavis.edu/data/climate/worldclim/1_4/grid/pst/lig/lig_30s_bio.zip", zip.names = "LIG_30s", dir.name = "LIG")

[presented with Pretty R]

Probability of occurrence from presence-background data in a data frame

In an interesting recent paper, Royle et al. (2012) presented a critical analysis of Maxent and proposed a method for obtaining probability of occurrence (rather than a vague suitability index) based on maximum likelihood from presence-background (rather than presence-absence) species distribution data. They provided an R package, maxlike, which can apply this method to a set of presence point coordinates and a raster stack of predictor variables. However, this implies not only that you need point coordinates of your species observations, but also that sampling units are square pixels, whereas we often need to model distribution data recorded on larger and more irregular units — such as provinces, river basins, or UTM cells, for example.

The maxlike.df function is a modification of maxlike that can be applied to a data frame instead of spatial objects. I added argument covariates (to use instead of rasters), which requires a matrix or data.frame of the variables (one in each column, with the same names used in argument formula), and argument presences (to use instead of points) for a vector/column with the species data (of the same size and in the same order as the covariates, with 1 for presence and 0 or NA for background or no records — mind that MaxLike, like MaxEnt, needs both presence and background data, so they’re not strictly “presence-only”). Everything else in maxlike.df works the same way as in maxlike (Chandler & Royle 2013), so please read the maxlike documentation for further information on how the method is implemented and interpreted. Note that they recommend that the variables be standardized (e.g. vars = scale(vars)) before modelling, and that you should select which variables to include in each model.

Note also that Royle et al.‘s (2012) approach has been supported by Fitzpatrick et al. (2013), but criticized by Hastie & Fithian (2013).

maxlike.df <- function (formula, rasters, points, presences, covariates, link = c("logit", "cloglog"), starts, hessian = TRUE, fixed, removeDuplicates = FALSE, savedata = FALSE, na.action = "na.omit", ...)
  {
  if (missing(rasters))  rasters <- covariates  # new line
  if (missing(points))  points <- presences  # new line
  if (identical(formula, ~1))
    stop("At least one continuous covariate must be specified in the formula")
  link <- match.arg(link)
  varnames <- all.vars(formula)
  call <- match.call()
  npts <- sum(points == 1)  # was nrow(points)
  cd.class <- class(rasters)[1]
  #if (cd.class != "rasterstack") stop("rasters must be a raster stack")
  #pt.class <- class(points)[1]
  #if (!pt.class %in% c("matrix", "data.frame")) stop("points must be a matrix or a data.frame")
  #if (ncol(points) != 2) stop("points must have 2 columns containing the x- and y- coordinates")
  pt.names <- colnames(points)
  #if (identical(cd.class, "rasterstack")) {
    cd.names <- names(rasters)
    npix <- nrow(rasters) # was  prod(dim(rasters)[1:2])
    id <- 1:nrow(rasters)  # new line
    cellID <- id[points == 1] # was cellFromXY(rasters, points)
    duplicates <- duplicated(cellID)
    if (removeDuplicates) {
      cellID <- unique(cellID)
      npts <- length(cellID)
      points.retained <- points[!duplicates, ]
    }
    x <- as.data.frame(rasters[points == 1, ])  # was as.data.frame(matrix(extract(rasters, cellID), npts))
    z <- rasters  # was as.data.frame(matrix(getValues(rasters), npix))
    names(x) <- names(z) <- cd.names
  #}
  if (!all(varnames %in% cd.names)) stop("at least 1 covariate in the formula is not in names(rasters).")
  X.mf <- model.frame(formula, x, na.action = na.action)
  X.mf.a <- attributes(X.mf)
  pts.removed <- integer(0)
  points.retained <- points
  if ("na.action" %in% names(X.mf.a)) {
    pts.removed <- X.mf.a$na.action
    npts.removed <- length(pts.removed)
    if (npts.removed > 0) {
      warning(paste(npts.removed, "points removed due to missing values"))
      points.retained <- points.retained[-pts.removed]  # I removed the comma before ']', as here 'points' is a vector rather than a two-column matrix/dataframe
    }
  }
  X <- model.matrix(formula, X.mf)
  Z.mf <- model.frame(formula, z, na.action = na.action)
  Z.mf.a <- attributes(Z.mf)
  pix.removed <- integer(0)
  if ("na.action" %in% names(Z.mf.a)) {
    pix.removed <- Z.mf.a$na.action
    npix.removed <- length(pix.removed)
  }
  Z <- model.matrix(formula, Z.mf)
  npars <- ncol(X)
  parnames <- colnames(X)
  if (!"(Intercept)" %in% parnames) 
    stop("The intercept must be estimated or fixed")
  if (missing(starts)) {
    starts <- rep(0, npars)
    names(starts) <- parnames
  }
  else names(starts) <- parnames
  if (identical(link, "logit")) {
    nll <- function(pars) {
      psix <- plogis(drop(X %*% pars))
      psiz <- sum(plogis(drop(Z %*% pars)))
      -1 * sum(log(psix/psiz))
    }
  }
  else if (identical(link, "cloglog")) {
    nll <- function(pars) {
      psix <- 1 - exp(-exp(drop(X %*% pars)))
      psiz <- sum(1 - exp(-exp(drop(Z %*% pars))))
      -1 * sum(log(psix/psiz))
    }
  }
  else stop("link function should be either 'logit' or 'cloglog'")
  is.fixed <- rep(FALSE, npars)
  if (!missing(fixed)) {
    if (length(fixed) != length(starts)) 
      stop("fixed should be a vector with the same length as the number of parameters to be estimated")
    if (sum(is.double(fixed)) < 1) 
      stop("fixed must contain at least one real value")
    is.fixed <- !is.na(fixed)
    if (sum(!is.fixed) < 1) 
      stop("you cannot fix all parameters in the model")
    npars <- sum(!is.fixed)
    nll.fix <- function(p) {
      p[is.fixed] <- fixed[is.fixed]
      do.call("nll", list(pars = p))
    }
    fm <- optim(starts, nll.fix, hessian = hessian, ...)
    fm$par[is.fixed] <- fixed[is.fixed]
  }
  else {
    fm <- optim(starts, nll, hessian = hessian, ...)
  }
  not.fixed <- !is.fixed
  par <- fm$par
  if (hessian) {
    vcTry <- try(solve(fm$hessian[not.fixed, not.fixed]))
    if (identical(class(vcTry), "matrix")) {
      vc <- matrix(0, length(par), length(par))
      vc[not.fixed, not.fixed] <- vcTry
      se <- sqrt(diag(vc))
    }
    else {
      vc <- matrix(NA, npars, npars)
      se <- rep(NA, npars)
    }
  }
  else {
    vc <- matrix(NA, npars, npars)
    se <- rep(NA, npars)
  }
  dimnames(vc) <- list(parnames, parnames)
  aic <- 2 * fm$value + 2 * npars
  fitted <- plogis(Z %*% par)
  fitted.values <- merge(data.frame(id = id), data.frame(id = rownames(fitted), fitted = fitted), all = TRUE)  # new line
  out <- list(Est = cbind(Est = par, SE = se), vcov = vc, AIC = aic, call = call, pts.removed = pts.removed, pix.removed = pix.removed, points.retained = points.retained, optim = fm, not.fixed = not.fixed, link = link, fitted.values = fitted.values$fitted)  # 'fitted.values' added
  if (class(rasters) %in% c("matrix", "data.frame"))  savedata = TRUE  # new line, otherwise 'predict' would fail
  if (savedata)
    out$rasters <- rasters
  class(out) <- c("maxlikeFit", "list")
  return(out)
}

# some changes to predict.maxlikeFit were also required to accomodate maxlike.df:
predict.maxlikeFit <- function (object, ...) {
  e <- coef(object)
  rasters <- object$rasters
  if (is.null(rasters)) {
    rasters <- try(get(as.character(object$call$rasters)))
    if (identical(class(rasters)[1], "try-error")) 
      stop("could not find the raster data")
    warning("raster data were not saved with object, using the data found in the workspace instead.")
  }
  link <- object$link
  cd.names <- names(rasters)
  cd.class <- class(rasters)[1]  # new line
  if (cd.class == "RasterStack") {  # new line
    npix <- prod(dim(rasters)[1:2])
    z <- as.data.frame(matrix(getValues(rasters), npix))
  }  # new line
  else if (cd.class %in% c("matrix", "data.frame")) {  # new line
    npix <- nrow(rasters)  # new line
    z <- rasters  # new line
  }  else stop("'rasters' must be either a RasterStack, a matrix or a data.frame")  # new line
  names(z) <- cd.names
  formula <- object$call$formula
  varnames <- all.vars(formula)
  if (!all(varnames %in% cd.names)) 
    stop("at least 1 covariate in the formula is not in rasters.")
  Z.mf <- model.frame(formula, z, na.action = "na.pass")
  Z.terms <- attr(Z.mf, "terms")
  Z <- model.matrix(Z.terms, Z.mf)
  eta <- drop(Z %*% coef(object))
  if (identical(link, "logit")) 
    psi.hat <- plogis(eta)
  else if (identical(link, "cloglog")) 
    psi.hat <- 1 - exp(-exp(eta))
  else stop("link function should be either 'logit' or 'cloglog'")
  if (cd.class == "RasterStack") {  # new line
    psi.mat <- matrix(psi.hat, dim(rasters)[1], dim(rasters)[2], byrow = TRUE)
    psi.raster <- raster(psi.mat)
    extent(psi.raster) <- extent(rasters)
    psi.raster
  }  # new line
  else if (cd.class == "data.frame") return(psi.hat)  # new line
}

[presented with Pretty R]

Usage example for a data frame with the species presence data in column 2 and the variables (named var1var2var3 and var4) in columns 3 to 6:

myspecies.maxlike <- maxlike.df(formula = ~ var1 + var2 + var3 + var4, covariates = mydata[ , 3:6], presences = mydata[ , 2], method = “BFGS”)

If you then want to explore the model with functions such as summary or predict, you need to install and load the maxlike package (which defines the appropriate class for this type of model). If not, just with the maxlike.df function you can already get some useful information from the model:

# get the model predictions applied to the analysed data:

myspecies.maxlike$fitted.values

# get the coefficient estimates and their standard errors:

myspecies.maxlike$Est

# get the model’s AIC:

myspecies.maxlike$AIC

You can then evaluate maxlike.df predictions with functions from the modEvA package, as in the following examples:

library(modEvA)

AUC(obs = mydata$myspecies, pred = myspecies.maxlike$fitted.values)

HLfit(obs = mydata$myspecies, pred =myspecies_maxlike$fitted.values)

plotGLM(obs =mydata$myspecies, pred =myspecies_maxlike$fitted.values)

References

Chandler, R.B. & Royle, J.A. (2013) Package ‘maxlike’: Model species distributions by estimating the probability of occurrence using presence-only data. Version  0.1-4, CRAN

Fitzpatrick, M.C., Gotelli, N.J. & Ellison, A.M. (2013) MaxEnt versus MaxLike: empirical comparisons with ant species distributions. Ecosphere 4, art55.

Hastie, T. & Fithian, W. (2013) Inference from presence-only data; the ongoing controversy. Ecography 36, 864-867.

Royle, J.A., Chandler, R.B., Yackulic, C. & Nichols, J.D. (2012) Likelihood analysis of species occurrence probability from presence-only data for modelling species distributions. Methods in Ecology and Evolution 3: 545–554

Distribution modelling based on presence points

The pres.point.mod function uses the dismo package to run a series of models (currently Bioclim, Domain and Maxent) for a set of presence point coordinates and a raster stack of predictor variables. Optionally, it will also apply these models to an extrapolation raster stack, and write prediction and extrapolation maps to disk. The pres.point.mod function is not included in a package, but it’s in the “Supplementary Info” of Zanolla et al. (in press), so please cite this paper if you use the function.

Note that Bioclim and Domain use only the information on the presence points, while Maxent uses also a set of background (pseudo-absence) points that are randomly generated if not provided by the user. Please read Hijmans & Elith’s “Species distribution modeling with R” for more info on these modelling methods and their R implementation.

pres.point.mod <- function(obs.xy, var.stack, extrap.stack = NULL, mod.methods = c("bioclim", "domain", "maxent"), bg.points = NULL, result.rasters.folder = NULL, result.rasters.format = "GTiff", overwrite = FALSE, ...) {
  # version 1.5 (8 Oct 2013)
  # obs.xy: a matrix or data frame containing 2 columns, the first with the x and the second with the y coordinates of the species presence points to be modelled
  # var.stack: a raster stack of the predictor variables to use in modelling, in the same coordinate reference system as obs.xy
  # extrap.stack: optionally, a raster stack of variables (with the same names and coordinate reference system as var.stack) to which to exrapolate the models
  # mod.methods: modelling methods to use; currently "bioclim", "domain" and "maxent" are implemented
  # bg.points: an optional matrix/dataframe of x-y coordinates of the background points or pseudo-absences to be used by maxent (the other 3 methods use only the presence points). If NULL, maxent will randomly generate 10000 points within the raster.stack
  # result.rasters.folder: optionally, the path to a folder (which will be created if not present) to save the rasters of model predictions to disk
  # result.rasters.format: format for the rasters to save in result.rasters.folder (type '?writeFormats' for options)
  # overwrite: logical, whether to overwrite raster files with the same name within the result.rasters.folder
  # ...: additional arguments for the 'maxent' function (see ?maxent details); relevant arguments may be 'a' for user-defined background points, 'path' for the folder in which to save maxent output files, and 'removeDuplicates' o specify whether duplicate presence points within the same pixel should be ignored

  start.time <- proc.time()
  if (!is.element("dismo", installed.packages()[,1])) stop("You need to have the 'dismo' package installed to run 'pres.point.mod'.")
  require(dismo)
  if(!is.null(result.rasters.folder)) dir.create(result.rasters.folder, showWarnings = FALSE)
  jar.path <- paste(system.file(package = "dismo"), "/java/", sep = "")
  if ("maxent" %in% mod.methods & !file.exists(paste(jar.path, "maxent.jar", sep = "")))  stop("'maxent.jar' file not found in '", jar.path, "; please put (a copy of) it there, or run 'pres.point.mod' without 'maxent' in 'mod.methods'", sep = "")
  results <- vector("list")

  if("bioclim" %in% mod.methods) {
    message("Building Bioclim model...  >>  ", Sys.time())
    bioc.model.expr <- expression(bioclim(x = var.stack, p = obs.xy))
    bioc.model <- tryCatch(eval(bioc.model.expr), error = function(e) NULL)
    if(!is.null(bioc.model)) {
      results[["bioc.model"]] <- bioc.model
      message("Making Bioclim prediction raster...  >>  ", Sys.time())
      bioc.rast <- predict(bioc.model, var.stack)
      results[["bioc.rast"]] <- bioc.rast
      if(!is.null(result.rasters.folder)) {
        message("Saving in result.rasters.folder...  >>  ", Sys.time())
        writeRaster(x = bioc.rast, filename = paste(result.rasters.folder, "Bioclim", sep = "/"), format = result.rasters.format, overwrite = overwrite)
      }  # end if result.rasters.folder
      if(!is.null(extrap.stack)) {
        message("Making Bioclim extrapolated raster...  >>  ", Sys.time())
        bioc.extrap.rast <- predict(bioc.model, extrap.stack)
        results[["bioc.extrap.rast"]] <- bioc.extrap.rast
        if(!is.null(result.rasters.folder)) {
          message("Saving in result.rasters.folder...  >>  ", Sys.time())
          writeRaster(x = bioc.extrap.rast, filename = paste(result.rasters.folder, "Bioclim_extrapolated", sep = "/"), format = result.rasters.format, overwrite = overwrite)
        }  # end if result.rasters.folder
      }  # end if extrap.stack
    }  # end if !null bioc model
    else message("...not! Bioclim model could not be obtained")
  }  # end if bioclim

  if("domain" %in% mod.methods) {
    message("Building Domain model...  >>  ", Sys.time())
    domain.model.expr <- expression(domain(x = var.stack, p = obs.xy))
    domain.model <- tryCatch(eval(domain.model.expr), error = function(e) NULL)
    if(!is.null(domain.model)) {
      results[["domain.model"]] <- domain.model
      message("Making Domain prediction raster...  >>  ", Sys.time())
      domain.rast <- predict(domain.model, var.stack)
      results[["domain.rast"]] <- domain.rast
      if(!is.null(result.rasters.folder)) {
        message("Saving in result.rasters.folder...  >>  ", Sys.time())
        writeRaster(x = domain.rast, filename = paste(result.rasters.folder, "Domain", sep = "/"), format = result.rasters.format, overwrite = overwrite)
      }  # end if result.rasters.folder
      if(!is.null(extrap.stack)) {
        message("Making Domain extrapolated raster...  >>  ", Sys.time())
        domain.extrap.rast <- predict(domain.model, extrap.stack)
        results[["domain.extrap.rast"]] <- domain.extrap.rast
        if(!is.null(result.rasters.folder)) {
          message("Saving in result.rasters.folder...  >>  ", Sys.time())
          writeRaster(x = domain.extrap.rast, filename = paste(result.rasters.folder, "Domain_extrapolated", sep = "/"), format = result.rasters.format, overwrite = overwrite)
        }  # end if result.rasters.folder
      }  # end if extrap.stack
    }  # end if !null domain model
    else message("...not! Domain model could not be obtained")
  }  # end if domain

  if("maxent" %in% mod.methods) {
    message("Building Maxent model...  >>  ", Sys.time())
    maxent.model.expr <- expression(maxent(x = var.stack, p = obs.xy, a = bg.points, ...))
    maxent.model <- tryCatch(eval(maxent.model.expr), error = function(e) NULL)
    if(!is.null(maxent.model)) {
      results[["maxent.model"]] <- maxent.model
      message("Making Maxent prediction raster...  >>  ", Sys.time())
      maxent.rast <- predict(maxent.model, var.stack)
      results[["maxent.rast"]] <- maxent.rast
      if(!is.null(result.rasters.folder)) {
        message("Saving in result.rasters.folder...  >>  ", Sys.time())
        writeRaster(x = maxent.rast, filename = paste(result.rasters.folder, "Maxent", sep = "/"), format = result.rasters.format, overwrite = overwrite)
      }  # end if result.rasters.folder
      if(!is.null(extrap.stack)) {
        message("Making Maxent extrapolated raster...  >>  ", Sys.time())
        maxent.extrap.rast <- predict(maxent.model, extrap.stack)
        results[["maxent.extrap.rast"]] <- maxent.extrap.rast
        if(!is.null(result.rasters.folder)) {
          message("Saving in result.rasters.folder...  >>  ", Sys.time())
          writeRaster(x = maxent.extrap.rast, filename = paste(result.rasters.folder, "Maxent_extrapolated", sep = "/"), format = result.rasters.format, overwrite = overwrite)
        }  # end if result.rasters.folder
      }  # end if extrap.stack
    }  # end if !null maxent model
    else message("...not! Maxent model could not be obtained")
  }  # end if maxent

  end.time <- proc.time()
  duration <- (end.time - start.time)[3]
  if (duration < 60) {
    units <- " second(s)"
  } else if (duration < 3600) {
    duration <- duration / 60
    units <- " minute(s)"
  } else {
    duration <- duration / 3600
    units <- " hour(s)"
  }

  message("Finished!  >>  ", Sys.time(), " -- it took ", round(duration),  units)
  return(results)
}  # end pres.point.mod function

[presented with Pretty R]

Please read the first lines of the function for an explanation of what each parameter means. Usage example:

mymodels <- pres.point.mod(obs.xy = mypoints, var.stack = predictor.stack, extrap.stack = var2050.stack, mod.methods = c(“bioclim”, “domain”, “maxent”), bg.points = bg.points, result.rasters.folder = “myspecies_models”, result.rasters.format = “GTiff”, overwrite = FALSE, path = “myspecies_maxent_out”)

 

pairs(mymodels$bioc.model)
response(mymodels$bioc.model)

response(mymodels$maxent.model)

 

plot(mymodels$bioc.rast)
plot(mymodels$domain.rast)
plot(mymodels$maxent.rast)

REFERENCES:

Zanolla M., Altamirano M., Carmona R., De La Rosa J., Souza-Egipsy V., Sherwood A., Tsiamis K., Barbosa A.M., Muñoz A.R. & Andreakis N. (in press) Assessing global range expansion in a cryptic species complex: insights from the red seaweed genus Asparagopsis (Florideophyceae). Journal of Phycology, accepted.

Convert geographical coordinates from degree-minute-second to decimal degrees

essIf you have latitude and/or longitude data in sexagesimal degrees, in the form degreesº minutes’ seconds” hemisfere (e.g. 41° 34′ 10.956″ N  or  8° 37′ 47.1036″ W, with or without spaces in between), the dms2dec function can convert them to decimal degrees, which are usually required for mapping. This function is not included in a package, but it’s in the “Supplementary Info” of Zanolla et al. (in review), so please cite this paper if you use the function.

dms2dec <- function(dms, separators = c("º", "°", "\'", "\"")) {
  # version 1.0 (25 Sep 3013)
  # dms: a vector (or column) of latitude or longitude in degrees-minutes-seconds-hemisfere, e.g. 41° 34' 10.956" N (with or without spaces)
  # separators: the characters that are separating degrees, minutes and seconds in dms

  dms <- as.character(dms)
  dms <- gsub(pattern = " ", replacement = "", x = dms)
  for (s in separators) dms <- gsub(pattern = s, replacement = "_splitHere_", x = dms)

  splits <- strsplit(dms, split = "_splitHere_")
  n <- length(dms)
  deg <- min <- sec <- hem <- vector("character", n)

  for (i in 1:n) {
    deg[i] <- splits[[i]][1]
    min[i] <- splits[[i]][2]
    sec[i] <- splits[[i]][3]
    hem[i] <- splits[[i]][4]
  }

  dec <- as.numeric(deg) + (as.numeric(min) / 60) + (as.numeric(sec) / 3600)
  sign <- ifelse (hem %in% c("N", "E"), 1, -1)
  dec <- sign * dec
  return(dec)
}  # end dms2dec function 

[presented with Pretty R]

Usage example, from a table called mydata:

LatLonTable

mydata$latitude.decimal <- dms2dec(mydata$Latitude)

mydata$longitude.decimal <- dms2dec(mydata$Longitude)

 

REFERENCES:

Zanolla M., Altamirano M., Carmona R., De La Rosa J., Souza-Egipsy V., Sherwood A., Tsiamis K., Barbosa A.M., Muñoz A.R. & Andreakis N. (in press) Assessing global range expansion in a cryptic species complex: insights from the red seaweed genus Asparagopsis (Florideophyceae). Journal of Phycology, accepted.