Functions for time tracking and management

Especially since I had to start working as a freelancer, it became essential to keep an accurate record of how much time I spend on each task and project. I checked out some software tools available online (namely TimeCamp, which I’ve used for a while), but I decided to see if I could write my own R functions for tracking time use, especially when working offline (some of us still like or need to work offline sometimes!), and for producing some related plots and analyses afterwards. Here’s what I came up with.

I first need to have a file named “PROJECTS.csv“, with a column named “project” and a column named “task“, and the names of the projects and tasks I’ll be working on, like this:

typeprojecttask
consultingelephantsanalysis
consultingelephantswriting
consultingelephantsrevising
trainingR-GISpreparing
trainingR-GISteaching
experimentingtimemanprogramming
experimentingtimemandebugging

I can manually add projects and tasks to this file at any time. Then I use the ‘start_task‘ and ‘stop_task‘ functions below to add entries to another file, which is named “DONT-TOUCH_timeman_data.csv“, has five columns (proj, task, start, stop, time) and should not be moved or modified manually:

projects <- read.csv("PROJECTS.csv")
tail(projects)

timeman_data <- read.csv("DONT-TOUCH_timeman_data.csv", as.is = TRUE)
tail(timeman_data)

start_task <- function(proj, task, time = NULL) {
  if (nrow(timeman_data) > 0 && is.na(timeman_data[nrow(timeman_data), ncol(timeman_data)]))
    stop ("previous task must be stopped before starting another")
  if (!(proj %in% unique(projects$project)))
    stop ("proj must exist in 'PROJECTS.csv'")
  if (!(task %in% unique(projects$task)))
    stop ("task must exist in 'PROJECTS.csv'")
  #if (!(proj %in% unique(timeman_data$projecto)))
    #timeman_data[nrow(timeman_data) + 1, "projecto"] <- proj
  if (is.null(time))
    time <- Sys.time()
  timeman_data[nrow(timeman_data) + 1, 1:3] <- c(proj, task, time)
  timeman_data <<- timeman_data
}

stop_task <- function(time = NULL) {
  if (nrow(timeman_data) > 0 && !is.na(timeman_data[nrow(timeman_data), ncol(timeman_data)]))
    stop ("task must be started before being stopped")
  if (is.null(time))
    time <- Sys.time()
  #this_row <- which.max(timeman_data$projecto == proj)
  this_row <- nrow(timeman_data)
  timeman_data[this_row, "stop"] <- as.character(time)
  timeman_data[this_row, "time"] <- difftime(timeman_data[this_row, "stop"], timeman_data[this_row, "start"])
  timeman_data <<- timeman_data
  write.csv(timeman_data, "DONT-TOUCH_timeman_data.csv", row.names = FALSE)
  cat("Showing last rows in tasks table:\n\n")
  tail(timeman_data)
}

current_task <- function() {
  timeman_data[nrow(timeman_data), ]
}

I need to use ‘start_task‘ every time I start working on a task, and then ‘stop_task‘ every time I stop. This adds an entry with the starting, stopping, and total time spent in each instance:

unique(projects$proj)
unique(projects$task)
unique(timeman_data$proj)

# ADD START:
start_task(proj = "timeman", task = "debugging")
tail(timeman_data)

# ADD STOP:
stop_task()
tail(timeman_data)

I can then apply all sorts of R analyses and plots to the time-tracking data, which is recorded in the ‘timeman_data‘ data frame and CSV file. However, these functions are certainly far from ideal, as they use “unsafe” operators (like ‘<<-‘) and they rely on a file remaining untouched in its folder. It’s plain to see how things could easily go wrong at any moment…. Suggestions welcome on how this could be improved and made safer! While this was certainly an interesting exercise, I think I’ll just stick to TimeCamp for the time being 🙂

Change the background of your plotting window

Changing the background of your screen to dark instead of light colours can save energy and also spare your eyes. RStudio offers several dark-background themes, but the plotting window is still white by default. The function below allows you to define a different colour for the background and for the remaining elements of the plots:

plot_bg <- function(back = "black", front = "white") {
par(bg = back)
par(col = front, col.axis = front, col.lab = front, col.main = front, col.sub = front, fg = front)
}

To try it out, just copy the function above to R, press “enter”, and then run the following commands while watching the plotting window:

plot_bg(back = "darkblue", front = "turquoise")
plot(1:10)
plot_bg(back = "black", front = "white")
plot(1:10)

The default of this function is white plots on a black background so, if that’s what you want, you just need to run the function and this simpler command:

plot_bg()

All plots produced afterwards (at least those made with base R, or with no conflicting graphical parameters) should appear on a black window — for example:

plot(1:10)

plot1-10

Plot column(s) of a polygon vector map

NOTE: I recently had a tutorial on the cartography R package, which makes mapping columns of a data frame much less troublesome. You may want to look at that instead.

If you have a polygon vector map in R and want to quickly map the values in one or more columns of its attribute table, you can use the plotMapColumns function. There’s an option to rasterize the map before plotting, which may be considerably faster and is TRUE by default, but you’ll need to use an appropriate raster.upscale value. This is the number by which the range of coordinates should be divided to get the number of pixels for the maps to be plotted; it’s advised to first check your range(coordinates(map)) and see for yourself which raster.upscale divisor will make reasonably computable raster maps – e.g., for geographical lat-lon, an upscale factor of 1 will usually work (you’ll have at most 360 x 180 pixels; actually you may want to lower raster.upscale to 0.5 or 0.3 if you need more detailed maps); but for a UTM projection (whose coordinates can be much larger values) you may need an upscale.factor of 10000 to get a reasonably computable number of pixels.

plotMapColumns <- function(map, # SpatialPolygons object
                           columns, # index(es) of column(s) of map@data containing the values to plot (there will be one output map per column)
                           rasterize = TRUE, # number by which the difference between maximum and minimum coordinates should be divided to get the number of pixels (if rasterize = TRUE); it's advised to first calculate min and max coordinates and see for yourself which divisor will make reasonably computable raster maps (e.g., for geographical lat-lon an upscale factor of 1 may work, but for a UTM projection you may need an upscale of factor 10,000!)
                           raster.upscale = 1, 
                           ...) # additional arguments for (sp)plot function
  {

  stopifnot(raster.upscale > 0 | is.null(raster.upscale),
            require(raster) | rasterize == FALSE
            )
 
  if (!all(columns %in% 1:length(names(map)))) stop ("index out of bounds; 'columns' must exist in map@data.")
 
  if (rasterize) {
    xmin <- min(coordinates(map)[,1])
    xmax <- max(coordinates(map)[,1])
    ymin <- min(coordinates(map)[,2])
    ymax <- max(coordinates(map)[,2])
    wdth <- round(xmax - xmin)
    hght <- round(ymax - ymin)
 
    #if (raster.upscale == "auto") {
      #max.length <- max(wdth, hght)
      #if (max.length > 500) raster.upscale <-
    #}
 
    #if (!is.null(max.rast.dim)) {
    #  rast.dim <- wdth * hght
    #}
 
    wdth <- wdth / raster.upscale
    hght <- hght / raster.upscale
    message("plotting map(s) with ", wdth, "x", hght, " pixels; consider rising 'raster.upscale' if this is taking too long, or lowering it if the resulting maps are too coarse.")
    require(raster)
    rast <- raster(nrows = hght, ncols = wdth, xmn = xmin, xmx = xmax, ymn = ymin, ymx = ymax)
  }  # end if rasterize I
 
  #if (centroids) {
  #  attr.table <- map@data
  #  map <- SpatialPointsDataFrame(coordinates(map))
  #  map@data <- attr.table
  #  rast <- raster(map)
  #} else {
    require(sp)
  #}
 
  n.cols <- length(columns)
  col.count <- 0
  for (i in columns) {
    col.count <- col.count + 1
    message("Plotting column ", col.count, " of ", n.cols, "...")
    if (rasterize) {
      map.rast <- rasterize(x = map, y = rast, field = names(map)[i], fun = 'last')
      plot(map.rast, main = names(map)[i], ...)
    }  # end if rasterize II
    else {
      print(spplot(map, zcol = names(map)[i], main = names(map)[i], ...))
    }  # end else
  }  # end for i
  message("Finished!")
}

[presented with Pretty R]

Usage example:

# download, unzip and import a map of countries:
download.file("http://biogeo.ucdavis.edu/data/world/countries_shp.zip", destfile = "countries_shp.zip")
unzip("countries_shp.zip")
countries <- rgdal::readOGR(dsn = ".", layer= "countries")
 
# see the data in the attributes table:
head(countries)
names(countries)
 
# use plotMapColumns with and without rasterizing:
plotMapColumns(countries, columns = 17:18, rasterize = TRUE, raster.upscale = 1)
plotMapColumns(countries, columns = 18, rasterize = FALSE)  # slower

You can add arguments for the (sp)plot function, to get e.g. different colour schemes. The plotMapColumns function is not (yet) included in a package.

Calculate zonal statistics from rasters in multiple zip files

This is a wrapper for the zonalFromZip function published in the previous post, for when you have multiple zip files with multiple raster files each (as in the WorldClim paleo-climate database), and you want to extract zonal statistics for them all automatically. To use it, you’ll need to have the zonalFromZip function loaded, as well as the raster R package.

zonalsFromZips <- function(zip.files, zones.map, rast.file.ext = ".tif", aux.file.ext = NULL, verbosity = 1, ...) {
  # v2.0 (2018/07/11)
  results <- vector("list", length(zip.files))
  names(results) <- basename(tools::file_path_sans_ext(zip.files))
  for (f in 1:length(zip.files)) {
    message("\nUNZIPPING FILE ", f, " OF ", length(zip.files), " (", basename(zip.files[f]), ")...")
    results[[f]] <- zonalFromZip(zip.file = zip.files[f], zones.map = zones.map, rast.file.ext = rast.file.ext, aux.file.ext = aux.file.ext, verbosity = verbosity, ...)
  }; rm(f)
  message("\nFINISHED ALL!")
  return(results)
}  # end zonalsFromZips function

The result is a list of dataframes, each containing the zonal stats for one of the .zip files of rasters. Usage example:

LGM.zonals <- zonalsFromZips(zip.files = list.files("/home/joe/LGM", full.names = TRUE), zones.map = provinces)

[presented with Pretty R]

 

Calculate zonal statistics from rasters in a zip file

Imagine you have a zipped folder with a bunch of raster maps containing variables (e.g. the ones you can download from WorldClim or from CliMond), and you need to calculate zonal statistics from each of these rasters. The zonaFromZip function, provided below, automates this process without the need to unzip the folders. It extracts one raster at a time from the .zip, imports it to R, calculates zonal statistics for your zones raster map (the ‘mean’ function is used by default, but you can provide any other argument accepted by the zonal function of the R raster package), and then deletes the unzipped file before unzipping the next one, therefore requiring minimal disk space.

zonalFromZip <- function (zip.file, zones.map, fun, rast.file.ext = ".tif", aux.file.ext = NULL, delete.unzipped = TRUE, verbosity = 2, ...)
 # version 2.1 (updated 11/12/2018)
 # zip.file: path to the zip containing the raster maps to calculate zonal stats from
 # zones.map: map (in your R workspace) containing the spatial units to calculate zonal stats to; must be of class 'raster', 'SpatialPolygons' or 'SpatialPolygonsDataFrame' and have the same CRS (and resolution if raster) of the maps in 'zip.file'
 # fun: function to calculate (e.g. mean)
 # rast.file.ext: file extension of the raster maps in 'zip.file'
 # aux.file.ext: file extension for the auxiliary files (e.g. ".hdr" for .bil raster files, or ".rdc" for Idrisi .rst files)
 # ...: additional arguments to pass to the 'raster::zonal' (if 'zones.map' is a raster) or the 'raster::extract' function (if 'zones.map' is a SpatialPolygons* map), such as na.rm = TRUE
{
 require(raster)
 rast.files <- unzip(zip.file, list = TRUE) $ Name
 var.names <- unique(tools::file_path_sans_ext(rast.files))
 n.var <- length(var.names)
 zonal.stats <- vector("list", length(var.names))
 names(zonal.stats) <- var.names
 
 for (i in 1:n.var) {
 if (verbosity >= 1) message("Getting variable ", i, " of ", n.var)
 if (verbosity >= 2) message(" - unzipping file...")
 unzip(zip.file, files = paste0(var.names[i], rast.file.ext))
 if (!is.null(aux.file.ext)) unzip(zip.file, files = paste0(var.names[i], aux.file.ext))
 var.rast <- raster(paste0(var.names[i], rast.file.ext))
 if (!compareRaster(var.rast, zones.map, stopiffalse = FALSE)) {
 if (verbosity >= 2) message(" - cropping to zones raster...")
 var.rast <- crop(var.rast, zones.map)
 }
 
 if (verbosity >= 2) message(" - calculating zonal stats...")
 if (class(zones.map) %in% c("raster", "RasterLayer"))
 zonal.stats[[i]] <- raster::zonal(var.rast, zones.map, fun = fun, ...)
 else if (class(zones.map) %in% c("SpatialPolygons", "SpatialPolygonsDataFrame"))
 zonal.stats[[i]] <- raster::extract(var.rast, zones.map, fun = fun, df = TRUE, ...)
 else stop("'zones.map' must be of class 'raster', 'RasterLayer', 'SpatialPolygons' or 'SpatialPolygonsDataFrame'")
 
 if (verbosity >= 2) message(" - deleting unzipped file...")
 if (delete.unzipped) {
 unlink(list.files()[grep(pattern = paste0(var.names[i], rast.file.ext), x = list.files())])
 if (!is.null(aux.file.ext)) unlink(list.files()[grep(pattern = paste0(var.names[i], aux.file.ext), x = list.files())])
 } # end if delete
 } # end for i
 
 if (verbosity >= 1) message("Converting results to data frame...")
 zonal.stats <- as.data.frame(zonal.stats)
 zonal.stats <- subset(zonal.stats, select = c(1, seq(2, ncol(zonal.stats), 2)))
 colnames(zonal.stats)[1] <- "zone"
 colnames(zonal.stats)[-1] <- var.names
 if (verbosity >= 1) message("Finished!")
 return(zonal.stats)
}

Mind that you need the raster R package installed for this, and a raster map of the spatial units (zones) to which you want to extract the raster variables. Usage examples:

LGM.CCSM4.utm10 <- zonalFromZip(zip.file = "LGM/CCSM4.zip", zones.map = utm10, rast.file.ext = ".tif", aux.file.ext = NULL)
head(LGM.CCSM4.utm10)
 
WClim.utm10 <- zonalFromZip(zip.file = "bio_2-5m_bil.zip", zones.map = utm10, rast.file.ext = ".bil", aux.file.ext = ".hdr")
head(WClim.utm10)

Example for several .zip files within a folder at once (DEPRECATED – see next post’s zonalsFromZips function instead):

for (f in list.files("LGM")) {  
# "LGM" is the folder containing the zip files to extract zonal stats from
  name <- paste("LGM", tools::file_path_sans_ext(f), "utm10", sep = ".")
  zonstat <- zonalFromZip(zip.file = paste("LGM", f, sep = "/"), zones.map = utm10, raster.ext = ".tif", fun = "mean")
  assign(name, zonstat)
}

Download several zip files automatically

I recently found out that a bunch of new past climate scenarios were made available on WorldClim, at least for past climate. While there may be more efficient ways to do this, here’s the R function I wrote to download several of them automatically based on the URLs (link locations) of the .zip files:

downloadZips <- function(zip.urls, zip.names = NULL, dir.name = NULL, unzip = FALSE) {
# zip.names: names to give the downloaded zip files (if different from the original ones)
# dir.name: name of the directory folder in which to store the downloaded files
# unzip: logical, whether or not to unzip the files after downloading
  stopifnot(is.null(zip.names) | length(zip.names) == length(zip.urls))
  if (!is.null(dir.name)) {
    dir.create(dir.name)
    setwd(dir.name)
    on.exit(setwd("../"))
  }
  if (is.null(zip.names))  zip.names <- tools::file_path_sans_ext(basename(zip.urls))
  for (z in 1:length(zip.urls)) {
    message("\nDownloading zip ", z, " of ", length(zip.urls), " (", zip.names[z], ")...\n")
    zip.file <- paste(zip.names[z], "zip", sep = ".")
    download.file(zip.urls[z], zip.file)
    if (unzip) {
      dir.create(zip.names[z])
      message("Unzipping file to folder '", zip.names[z], "'...")
      unzip(zip.file, exdir = zip.names[z])
    }  # end if unzip
  }  # end for z
  message("Finished!")
}  # end downloadZips function

Usage examples (mind that these large files take quite a while to download):

# LGM:
downloadZips(zip.urls = c("http://biogeo.ucdavis.edu/data/climate/cmip5/lgm/cclgmbi_2-5m.zip", "http://biogeo.ucdavis.edu/data/climate/cmip5/lgm/mrlgmbi_2-5m.zip", "http://biogeo.ucdavis.edu/data/climate/cmip5/lgm/melgmbi_2-5m.zip"), zip.names = c("CCSM4", "MIROC_ESM", "MPI_ESM_P"), dir.name = "LGM")
 
# Mid Holocene:
downloadZips(zip.urls = c("http://biogeo.ucdavis.edu/data/climate/cmip5/mid/bcmidbi_2-5m.zip", "http://biogeo.ucdavis.edu/data/climate/cmip5/mid/ccmidbi_2-5m.zip", "http://biogeo.ucdavis.edu/data/climate/cmip5/mid/cnmidbi_2-5m.zip", "http://biogeo.ucdavis.edu/data/climate/cmip5/mid/hgmidbi_2-5m.zip", "http://biogeo.ucdavis.edu/data/climate/cmip5/mid/hemidbi_2-5m.zip", "http://biogeo.ucdavis.edu/data/climate/cmip5/mid/ipmidbi_2-5m.zip", "http://biogeo.ucdavis.edu/data/climate/cmip5/mid/mrmidbi_2-5m.zip", "http://biogeo.ucdavis.edu/data/climate/cmip5/mid/memidbi_2-5m.zip", "http://biogeo.ucdavis.edu/data/climate/cmip5/mid/mgmidbi_2-5m.zip"), zip.names = c("BCC_CSM1_1", "CCSM4", "CNRM_CM5", "HadGEM2_CC", "HadGEM2_ES", "IPSL_CM5A_LR", "MIROC_ESM", "MPI_ESM_P", "MRI_CGCM3"), dir.name = "MidHol")
 
# Last Inter-Glacial:
downloadZips(zip.urls = "http://biogeo.ucdavis.edu/data/climate/worldclim/1_4/grid/pst/lig/lig_30s_bio.zip", zip.names = "LIG_30s", dir.name = "LIG")

[presented with Pretty R]

Probability of occurrence from presence-background data in a data frame

In an interesting recent paper, Royle et al. (2012) presented a critical analysis of Maxent and proposed a method for obtaining probability of occurrence (rather than a vague suitability index) based on maximum likelihood from presence-background (rather than presence-absence) species distribution data. They provided an R package, maxlike, which can apply this method to a set of presence point coordinates and a raster stack of predictor variables. However, this implies not only that you need point coordinates of your species observations, but also that sampling units are square pixels, whereas we often need to model distribution data recorded on larger and more irregular units — such as provinces, river basins, or UTM cells, for example.

The maxlike.df function is a modification of maxlike that can be applied to a data frame instead of spatial objects. I added argument covariates (to use instead of rasters), which requires a matrix or data.frame of the variables (one in each column, with the same names used in argument formula), and argument presences (to use instead of points) for a vector/column with the species data (of the same size and in the same order as the covariates, with 1 for presence and 0 or NA for background or no records — mind that MaxLike, like MaxEnt, needs both presence and background data, so they’re not strictly “presence-only”). Everything else in maxlike.df works the same way as in maxlike (Chandler & Royle 2013), so please read the maxlike documentation for further information on how the method is implemented and interpreted. Note that they recommend that the variables be standardized (e.g. vars = scale(vars)) before modelling, and that you should select which variables to include in each model.

Note also that Royle et al.‘s (2012) approach has been supported by Fitzpatrick et al. (2013), but criticized by Hastie & Fithian (2013).

UPDATE: the latest version of the maxlike package (0.1-7) now allows models to be calibrated with data frames, via arguments x (data.frame with environmental conditions at presence point locations) and z (data.frame with environmental conditions at background locations). The function below is therefore deprecated, and you should use the maxlike package instead.

maxlike.df <- function (formula, rasters, points, presences, covariates, link = c("logit", "cloglog"), starts, hessian = TRUE, fixed, removeDuplicates = FALSE, savedata = FALSE, na.action = "na.omit", ...)
  {
  if (missing(rasters))  rasters <- covariates  # new line
  if (missing(points))  points <- presences  # new line
  if (identical(formula, ~1))
    stop("At least one continuous covariate must be specified in the formula")
  link <- match.arg(link)
  varnames <- all.vars(formula)
  call <- match.call()
  npts <- sum(points == 1)  # was nrow(points)
  cd.class <- class(rasters)[1]
  #if (cd.class != "rasterstack") stop("rasters must be a raster stack")
  #pt.class <- class(points)[1]
  #if (!pt.class %in% c("matrix", "data.frame")) stop("points must be a matrix or a data.frame")
  #if (ncol(points) != 2) stop("points must have 2 columns containing the x- and y- coordinates")
  pt.names <- colnames(points)
  #if (identical(cd.class, "rasterstack")) {
    cd.names <- names(rasters)
    npix <- nrow(rasters) # was  prod(dim(rasters)[1:2])
    id <- 1:nrow(rasters)  # new line
    cellID <- id[points == 1] # was cellFromXY(rasters, points)
    duplicates <- duplicated(cellID)
    if (removeDuplicates) {
      cellID <- unique(cellID)
      npts <- length(cellID)
      points.retained <- points[!duplicates, ]
    }
    x <- as.data.frame(rasters[points == 1, ])  # was as.data.frame(matrix(extract(rasters, cellID), npts))
    z <- rasters  # was as.data.frame(matrix(getValues(rasters), npix))
    names(x) <- names(z) <- cd.names
  #}
  if (!all(varnames %in% cd.names)) stop("at least 1 covariate in the formula is not in names(rasters).")
  X.mf <- model.frame(formula, x, na.action = na.action)
  X.mf.a <- attributes(X.mf)
  pts.removed <- integer(0)
  points.retained <- points
  if ("na.action" %in% names(X.mf.a)) {
    pts.removed <- X.mf.a$na.action
    npts.removed <- length(pts.removed)
    if (npts.removed > 0) {
      warning(paste(npts.removed, "points removed due to missing values"))
      points.retained <- points.retained[-pts.removed]  # I removed the comma before ']', as here 'points' is a vector rather than a two-column matrix/dataframe
    }
  }
  X <- model.matrix(formula, X.mf)
  Z.mf <- model.frame(formula, z, na.action = na.action)
  Z.mf.a <- attributes(Z.mf)
  pix.removed <- integer(0)
  if ("na.action" %in% names(Z.mf.a)) {
    pix.removed <- Z.mf.a$na.action
    npix.removed <- length(pix.removed)
  }
  Z <- model.matrix(formula, Z.mf)
  npars <- ncol(X)
  parnames <- colnames(X)
  if (!"(Intercept)" %in% parnames) 
    stop("The intercept must be estimated or fixed")
  if (missing(starts)) {
    starts <- rep(0, npars)
    names(starts) <- parnames
  }
  else names(starts) <- parnames
  if (identical(link, "logit")) {
    nll <- function(pars) {
      psix <- plogis(drop(X %*% pars))
      psiz <- sum(plogis(drop(Z %*% pars)))
      -1 * sum(log(psix/psiz))
    }
  }
  else if (identical(link, "cloglog")) {
    nll <- function(pars) {
      psix <- 1 - exp(-exp(drop(X %*% pars)))
      psiz <- sum(1 - exp(-exp(drop(Z %*% pars))))
      -1 * sum(log(psix/psiz))
    }
  }
  else stop("link function should be either 'logit' or 'cloglog'")
  is.fixed <- rep(FALSE, npars)
  if (!missing(fixed)) {
    if (length(fixed) != length(starts)) 
      stop("fixed should be a vector with the same length as the number of parameters to be estimated")
    if (sum(is.double(fixed)) < 1) 
      stop("fixed must contain at least one real value")
    is.fixed <- !is.na(fixed)
    if (sum(!is.fixed) < 1) 
      stop("you cannot fix all parameters in the model")
    npars <- sum(!is.fixed)
    nll.fix <- function(p) {
      p[is.fixed] <- fixed[is.fixed]
      do.call("nll", list(pars = p))
    }
    fm <- optim(starts, nll.fix, hessian = hessian, ...)
    fm$par[is.fixed] <- fixed[is.fixed]
  }
  else {
    fm <- optim(starts, nll, hessian = hessian, ...)
  }
  not.fixed <- !is.fixed
  par <- fm$par
  if (hessian) {
    vcTry <- try(solve(fm$hessian[not.fixed, not.fixed]))
    if (identical(class(vcTry), "matrix")) {
      vc <- matrix(0, length(par), length(par))
      vc[not.fixed, not.fixed] <- vcTry
      se <- sqrt(diag(vc))
    }
    else {
      vc <- matrix(NA, npars, npars)
      se <- rep(NA, npars)
    }
  }
  else {
    vc <- matrix(NA, npars, npars)
    se <- rep(NA, npars)
  }
  dimnames(vc) <- list(parnames, parnames)
  aic <- 2 * fm$value + 2 * npars
  fitted <- plogis(Z %*% par)
  fitted.values <- merge(data.frame(id = id), data.frame(id = rownames(fitted), fitted = fitted), all = TRUE)  # new line
  out <- list(Est = cbind(Est = par, SE = se), vcov = vc, AIC = aic, call = call, pts.removed = pts.removed, pix.removed = pix.removed, points.retained = points.retained, optim = fm, not.fixed = not.fixed, link = link, fitted.values = fitted.values$fitted)  # 'fitted.values' added
  if (class(rasters) %in% c("matrix", "data.frame"))  savedata = TRUE  # new line, otherwise 'predict' would fail
  if (savedata)
    out$rasters <- rasters
  class(out) <- c("maxlikeFit", "list")
  return(out)
}

# some changes to predict.maxlikeFit were also required to accomodate maxlike.df:
predict.maxlikeFit <- function (object, ...) {
  e <- coef(object)
  rasters <- object$rasters
  if (is.null(rasters)) {
    rasters <- try(get(as.character(object$call$rasters)))
    if (identical(class(rasters)[1], "try-error")) 
      stop("could not find the raster data")
    warning("raster data were not saved with object, using the data found in the workspace instead.")
  }
  link <- object$link
  cd.names <- names(rasters)
  cd.class <- class(rasters)[1]  # new line
  if (cd.class == "RasterStack") {  # new line
    npix <- prod(dim(rasters)[1:2])
    z <- as.data.frame(matrix(getValues(rasters), npix))
  }  # new line
  else if (cd.class %in% c("matrix", "data.frame")) {  # new line
    npix <- nrow(rasters)  # new line
    z <- rasters  # new line
  }  else stop("'rasters' must be either a RasterStack, a matrix or a data.frame")  # new line
  names(z) <- cd.names
  formula <- object$call$formula
  varnames <- all.vars(formula)
  if (!all(varnames %in% cd.names)) 
    stop("at least 1 covariate in the formula is not in rasters.")
  Z.mf <- model.frame(formula, z, na.action = "na.pass")
  Z.terms <- attr(Z.mf, "terms")
  Z <- model.matrix(Z.terms, Z.mf)
  eta <- drop(Z %*% coef(object))
  if (identical(link, "logit")) 
    psi.hat <- plogis(eta)
  else if (identical(link, "cloglog")) 
    psi.hat <- 1 - exp(-exp(eta))
  else stop("link function should be either 'logit' or 'cloglog'")
  if (cd.class == "RasterStack") {  # new line
    psi.mat <- matrix(psi.hat, dim(rasters)[1], dim(rasters)[2], byrow = TRUE)
    psi.raster <- raster(psi.mat)
    extent(psi.raster) <- extent(rasters)
    psi.raster
  }  # new line
  else if (cd.class == "data.frame") return(psi.hat)  # new line
}

[presented with Pretty R]

Usage example for a data frame with the species presence data in column 2 and the variables (named var1var2var3 and var4) in columns 3 to 6:

myspecies.maxlike <- maxlike.df(formula = ~ var1 + var2 + var3 + var4, covariates = mydata[ , 3:6], presences = mydata[ , 2], method = "BFGS")

If you then want to explore the model with functions such as summary or predict, you need to install and load the maxlike package (which defines the appropriate class for this type of model). If not, just with the maxlike.df function you can already get some useful information from the model:

# get the model predictions applied to the analysed data:
myspecies.maxlike$fitted.values

# get the coefficient estimates and their standard errors:
myspecies.maxlike$Est

# get the model's AIC:
myspecies.maxlike$AIC

You can then evaluate maxlike.df predictions with functions from the modEvA package, as in the following examples:

library(modEvA)

AUC(obs = mydata$myspecies, pred = myspecies.maxlike$fitted.values)
HLfit(obs = mydata$myspecies, pred = myspecies_maxlike$fitted.values)
plotGLM(obs = mydata$myspecies, pred = myspecies_maxlike$fitted.values)

References

Chandler, R.B. & Royle, J.A. (2013) Package ‘maxlike’: Model species distributions by estimating the probability of occurrence using presence-only data. Version  0.1-4, CRAN

Fitzpatrick, M.C., Gotelli, N.J. & Ellison, A.M. (2013) MaxEnt versus MaxLike: empirical comparisons with ant species distributions. Ecosphere 4, art55.

Hastie, T. & Fithian, W. (2013) Inference from presence-only data; the ongoing controversy. Ecography 36, 864-867.

Royle, J.A., Chandler, R.B., Yackulic, C. & Nichols, J.D. (2012) Likelihood analysis of species occurrence probability from presence-only data for modelling species distributions. Methods in Ecology and Evolution 3: 545–554

Distribution modelling based on presence points

The pres.point.mod function uses the dismo package to run a series of models (currently Bioclim, Domain and Maxent) for a set of presence point coordinates and a raster stack of predictor variables. Optionally, it will also apply these models to an extrapolation raster stack, and write prediction and extrapolation maps to disk. The pres.point.mod function is not included in a package, but it’s in the “Supporting Information” of Zanolla et al. (2018), so please cite this paper if you use the function.

Note that Bioclim and Domain use only the information on the presence points, while Maxent uses also a set of background (pseudo-absence) points that are randomly generated if not provided by the user. Please read Hijmans & Elith’s “Species distribution modeling with R” for more info on these modelling methods and their R implementation.

pres.point.mod <- function(obs.xy, var.stack, extrap.stack = NULL, mod.methods = c("bioclim", "domain", "maxent"), bg.points = NULL, result.rasters.folder = NULL, result.rasters.format = "GTiff", overwrite = FALSE, ...) {
  # version 1.5 (8 Oct 2013)
  # obs.xy: a matrix or data frame containing 2 columns, the first with the x and the second with the y coordinates of the species presence points to be modelled
  # var.stack: a raster stack of the predictor variables to use in modelling, in the same coordinate reference system as obs.xy
  # extrap.stack: optionally, a raster stack of variables (with the same names and coordinate reference system as var.stack) to which to exrapolate the models
  # mod.methods: modelling methods to use; currently "bioclim", "domain" and "maxent" are implemented
  # bg.points: an optional matrix/dataframe of x-y coordinates of the background points or pseudo-absences to be used by maxent (the other 3 methods use only the presence points). If NULL, maxent will randomly generate 10000 points within the raster.stack
  # result.rasters.folder: optionally, the path to a folder (which will be created if not present) to save the rasters of model predictions to disk
  # result.rasters.format: format for the rasters to save in result.rasters.folder (type '?writeFormats' for options)
  # overwrite: logical, whether to overwrite raster files with the same name within the result.rasters.folder
  # ...: additional arguments for the 'maxent' function (see ?maxent details); relevant arguments may be 'a' for user-defined background points, 'path' for the folder in which to save maxent output files, and 'removeDuplicates' o specify whether duplicate presence points within the same pixel should be ignored

  start.time <- proc.time()
  if (!is.element("dismo", installed.packages()[,1])) stop("You need to have the 'dismo' package installed to run 'pres.point.mod'.")
  require(dismo)
  if(!is.null(result.rasters.folder)) dir.create(result.rasters.folder, showWarnings = FALSE)
  jar.path <- paste(system.file(package = "dismo"), "/java/", sep = "")
  if ("maxent" %in% mod.methods & !file.exists(paste(jar.path, "maxent.jar", sep = "")))  stop("'maxent.jar' file not found in '", jar.path, "; please put (a copy of) it there, or run 'pres.point.mod' without 'maxent' in 'mod.methods'", sep = "")
  results <- vector("list")

  if("bioclim" %in% mod.methods) {
    message("Building Bioclim model...  >>  ", Sys.time())
    bioc.model.expr <- expression(bioclim(x = var.stack, p = obs.xy))
    bioc.model <- tryCatch(eval(bioc.model.expr), error = function(e) NULL)
    if(!is.null(bioc.model)) {
      results[["bioc.model"]] <- bioc.model
      message("Making Bioclim prediction raster...  >>  ", Sys.time())
      bioc.rast <- predict(bioc.model, var.stack)
      results[["bioc.rast"]] <- bioc.rast
      if(!is.null(result.rasters.folder)) {
        message("Saving in result.rasters.folder...  >>  ", Sys.time())
        writeRaster(x = bioc.rast, filename = paste(result.rasters.folder, "Bioclim", sep = "/"), format = result.rasters.format, overwrite = overwrite)
      }  # end if result.rasters.folder
      if(!is.null(extrap.stack)) {
        message("Making Bioclim extrapolated raster...  >>  ", Sys.time())
        bioc.extrap.rast <- predict(bioc.model, extrap.stack)
        results[["bioc.extrap.rast"]] <- bioc.extrap.rast
        if(!is.null(result.rasters.folder)) {
          message("Saving in result.rasters.folder...  >>  ", Sys.time())
          writeRaster(x = bioc.extrap.rast, filename = paste(result.rasters.folder, "Bioclim_extrapolated", sep = "/"), format = result.rasters.format, overwrite = overwrite)
        }  # end if result.rasters.folder
      }  # end if extrap.stack
    }  # end if !null bioc model
    else message("...not! Bioclim model could not be obtained")
  }  # end if bioclim

  if("domain" %in% mod.methods) {
    message("Building Domain model...  >>  ", Sys.time())
    domain.model.expr <- expression(domain(x = var.stack, p = obs.xy))
    domain.model <- tryCatch(eval(domain.model.expr), error = function(e) NULL)
    if(!is.null(domain.model)) {
      results[["domain.model"]] <- domain.model
      message("Making Domain prediction raster...  >>  ", Sys.time())
      domain.rast <- predict(domain.model, var.stack)
      results[["domain.rast"]] <- domain.rast
      if(!is.null(result.rasters.folder)) {
        message("Saving in result.rasters.folder...  >>  ", Sys.time())
        writeRaster(x = domain.rast, filename = paste(result.rasters.folder, "Domain", sep = "/"), format = result.rasters.format, overwrite = overwrite)
      }  # end if result.rasters.folder
      if(!is.null(extrap.stack)) {
        message("Making Domain extrapolated raster...  >>  ", Sys.time())
        domain.extrap.rast <- predict(domain.model, extrap.stack)
        results[["domain.extrap.rast"]] <- domain.extrap.rast
        if(!is.null(result.rasters.folder)) {
          message("Saving in result.rasters.folder...  >>  ", Sys.time())
          writeRaster(x = domain.extrap.rast, filename = paste(result.rasters.folder, "Domain_extrapolated", sep = "/"), format = result.rasters.format, overwrite = overwrite)
        }  # end if result.rasters.folder
      }  # end if extrap.stack
    }  # end if !null domain model
    else message("...not! Domain model could not be obtained")
  }  # end if domain

  if("maxent" %in% mod.methods) {
    message("Building Maxent model...  >>  ", Sys.time())
    maxent.model.expr <- expression(maxent(x = var.stack, p = obs.xy, a = bg.points, ...))
    maxent.model <- tryCatch(eval(maxent.model.expr), error = function(e) NULL)
    if(!is.null(maxent.model)) {
      results[["maxent.model"]] <- maxent.model
      message("Making Maxent prediction raster...  >>  ", Sys.time())
      maxent.rast <- predict(maxent.model, var.stack)
      results[["maxent.rast"]] <- maxent.rast
      if(!is.null(result.rasters.folder)) {
        message("Saving in result.rasters.folder...  >>  ", Sys.time())
        writeRaster(x = maxent.rast, filename = paste(result.rasters.folder, "Maxent", sep = "/"), format = result.rasters.format, overwrite = overwrite)
      }  # end if result.rasters.folder
      if(!is.null(extrap.stack)) {
        message("Making Maxent extrapolated raster...  >>  ", Sys.time())
        maxent.extrap.rast <- predict(maxent.model, extrap.stack)
        results[["maxent.extrap.rast"]] <- maxent.extrap.rast
        if(!is.null(result.rasters.folder)) {
          message("Saving in result.rasters.folder...  >>  ", Sys.time())
          writeRaster(x = maxent.extrap.rast, filename = paste(result.rasters.folder, "Maxent_extrapolated", sep = "/"), format = result.rasters.format, overwrite = overwrite)
        }  # end if result.rasters.folder
      }  # end if extrap.stack
    }  # end if !null maxent model
    else message("...not! Maxent model could not be obtained")
  }  # end if maxent

  end.time <- proc.time()
  duration <- (end.time - start.time)[3]
  if (duration < 60) {
    units <- " second(s)"
  } else if (duration < 3600) {
    duration <- duration / 60
    units <- " minute(s)"
  } else {
    duration <- duration / 3600
    units <- " hour(s)"
  }

  message("Finished!  >>  ", Sys.time(), " -- it took ", round(duration),  units)
  return(results)
}  # end pres.point.mod function

[presented with Pretty R]

Please read the first lines of the function above for an explanation of what each parameter means.

Usage example:

mymodels <- pres.point.mod(obs.xy = mypoints, var.stack = predictor.stack, extrap.stack = var2050.stack, mod.methods = c("bioclim", "domain", "maxent"), bg.points = bg.points, result.rasters.folder = "myspecies_models", result.rasters.format = "GTiff", overwrite = FALSE, path = "myspecies_maxent_out")

pairs(mymodels$bioc.model)
response(mymodels$bioc.model)

plot(mymodels$bioc.rast)
plot(mymodels$domain.rast)
plot(mymodels$maxent.rast)

REFERENCES:

Zanolla M., Altamirano M., Carmona R., De La Rosa J., Souza-Egipsy V., Sherwood A., Tsiamis K., Barbosa A.M., Muñoz A.R. & Andreakis N. (2018) Assessing global range expansion in a cryptic species complex: insights from the red seaweed genus Asparagopsis (Florideophyceae). Journal of Phycology, 54: 12-24.

Convert geographical coordinates from degree-minute-second to decimal degrees

If you have latitude and/or longitude data in sexagesimal degrees, in the form degreesº minutes’ seconds” hemisfere (e.g. 41° 34′ 10.956″ N  or  8° 37′ 47.1036″ W, with or without spaces in between), the dms2dec function can convert them to decimal degrees, which are usually required for mapping. This function is not included in a package, but it’s in the “Supporting Information” of Zanolla et al. (2018), so please cite this paper if you use the function.

dms2dec <- function(dms, separators = c("º", "°", "\'", "\"")) {
  # version 1.0 (25 Sep 3013)
  # dms: a vector (or column) of latitude or longitude in degrees-minutes-seconds-hemisfere, e.g. 41° 34' 10.956" N (with or without spaces)
  # separators: the characters that are separating degrees, minutes and seconds in dms

  dms <- as.character(dms)
  dms <- gsub(pattern = " ", replacement = "", x = dms)
  for (s in separators) dms <- gsub(pattern = s, replacement = "_splitHere_", x = dms)

  splits <- strsplit(dms, split = "_splitHere_")
  n <- length(dms)
  deg <- min <- sec <- hem <- vector("character", n)

  for (i in 1:n) {
    deg[i] <- splits[[i]][1]
    min[i] <- splits[[i]][2]
    sec[i] <- splits[[i]][3]
    hem[i] <- splits[[i]][4]
  }

  dec <- as.numeric(deg) + (as.numeric(min) / 60) + (as.numeric(sec) / 3600)
  sign <- ifelse (hem %in% c("N", "E"), 1, -1)
  dec <- sign * dec
  return(dec)
}  # end dms2dec function 

[presented with Pretty R]

Usage example, from a table called mydata:

LatLonTable

mydata$latitude.decimal <- dms2dec(mydata$Latitude)

mydata$longitude.decimal <- dms2dec(mydata$Longitude)

REFERENCES:

Zanolla M., Altamirano M., Carmona R., De La Rosa J., Souza-Egipsy V., Sherwood A., Tsiamis K., Barbosa A.M., Muñoz A.R. & Andreakis N. (2018) Assessing global range expansion in a cryptic species complex: insights from the red seaweed genus Asparagopsis (Florideophyceae). Journal of Phycology, 54: 12-24

Upscale UTM 10 x 10 km cells to 20, 50 or 100 km

If you have a map with/or a data table of UTM 10 x 10 km cells with the corresponding UTM codes, and want to upscale them to 20 x 20, 50 x 50 or 100 x 100 km cells, you can use the UTM10upscale function to create cell codes that group the UTM codes accordingly. This function is not included in an R package and is used as a stand-alone.

UTM10upscale <- function(UTMcode, size) {
  # version 1.3 (14 Mai 2013)
  # UTMcode: a vector string of UTM 10x10 km codes (e.g. MC58 or 29SMC58)
  # size: the size (in square km) of the UTM cells for which to create codes; must be 20, 50 or 100
  if (missing(size)) stop ("Argument 'size' is missing, with no default.")
  if (!(size %in% c(20, 50, 100))) stop ("'size' must be a multiple of 10 and a divisor of 100 - i.e. 20, 50 or 100.")
  UTMcode <- as.character(UTMcode)
  nc <- unique(nchar(UTMcode))
  if (length(nc) != 1) stop ("All elements of UTMcode must have the same number of characters.")
  utm10letters <- substr(UTMcode, nc - 4, nc -2)
  if(size == 100)  upscaled.code <- utm10letters
  else {
    utm10x <- substr(UTMcode, nc - 1, nc - 1)  # penultimate digit of UTMcode
    utm10y <- substr(UTMcode, nc, nc)  # last digit of UTMcode
    utm10index <- 0 : 9
    if (size == 20)  upscaled.index <- rep(letters[1 : 5], each = 2)
    else if (size == 50) upscaled.index <- rep(LETTERS[1 : 2], each = 5)
    n <- length(UTMcode)
    upscaled.x <- upscaled.y <- vector("integer", n)
    for (i in 1 : n) {
      x <- which(utm10index == utm10x[i])
      y <- which(utm10index == utm10y[i])
      upscaled.x[i] <- upscaled.index[x]
      upscaled.y[i] <- upscaled.index[y]
    }  # end for i
    upscaled.code <- paste(utm10letters, upscaled.x, upscaled.y, sep = "")
    }  # end if size 100 else
  return(upscaled.code)
}  # end UTM10upscale function

[presented with Pretty R]

Note that the resulting codes for 20×20 and for 50×50 km cells (namely the last two characters) are not “official” UTM codes, just ones that I made up. I used letters instead of numbers, with lower case for 20-km and upper case for 50-km cells, so that these codes cannot be confused with the UTM 10×10 km codes.

Then you just have to add a column with the new codes to your map’s table of attributes, and use e.g. a spatial R package or a GIS to dissolve the UTM 10 x 10 km cells using the values in this column. You can also use R’s aggregate function to summarize your data into the upscaled cells.

mydata$utm20 <- utm10upscale(mydata$utm10, size = 20)
mydata$utm50 <- utm10upscale(mydata$utm10, size = 50)