Create species name codes

Often we need to work with data sets that have species in column names. Species names are often larger than the maximum size allowed, for example, in .dbf tables to associate with maps in a GIS. We thus need to create codes for the species names, which should be short but also unambiguous (you should recognize the species through the species code) and unique (there must not be two species with the same code). The spCodes function, now included in the fuzzySim package (Barbosa, 2014), combines a specified number of characters of the genus and species (and optionally also the subspecies) names, and checks if the codes are unique:

spCodes <- function(species, nchar.gen = 3, nchar.sp = 3, nchar.ssp = 0, sep.species = " ", sep.spcode = "") {
  # version 1.8 (7 Mai 2013)

  species <- as.character(species)
  splits <- strsplit(species, split = sep.species)
  gen <- sp <- vector("character", length(splits))
  for (i in 1:length(splits)) {
    gen[i] <- splits[[i]][1]
    sp[i] <- splits[[i]][2]
    if (is.na(sp[i]))  sp[i] <- ""
  }  # end for i

  abbrev <- function (string, nchar) {
    abv <- substr(string, start = 1, stop = nchar)
    return(abv)
  }  # end abbrev function
  gen.code <- abbrev(gen, nchar.gen)
  sp.code <- abbrev(sp, nchar.sp)
  spcode <- paste(gen.code, sp.code, sep = sep.spcode)

  if (nchar.ssp > 0) {
    ssp <- vector("character", length(splits))
    for (i in 1:length(splits)) {
      ssp[i] <- splits[[i]][3]
      if (is.na(ssp[i]))  ssp[i] <- ""
    }
    ssp.code <- abbrev(ssp, nchar.ssp)
    spcode <- paste(spcode, ssp.code, sep = sep.spcode)
  }

  if (length(unique(species)) != length(unique(spcode))) {
    data <- cbind(species, spcode)
    unique.data <- data[!duplicated(data[ , c("species")]), ]
    ordered.data <- unique.data[order(unique.data[, "spcode"]), ]
    duplicates <- duplicated(ordered.data[ , "spcode"])
    for (i in 1:length(duplicates)) {
      if (isTRUE(duplicates[i])) duplicates[i-1] <- TRUE
    }  # to include the original as well as the duplicate
    ordered.data.duplicates <- ordered.data[duplicates, ]
    print(ordered.data.duplicates)
    stop("Resulting species codes are not unique (see above); try a different combination of nchar.gen, nchar.sp (and nchar.ssp), or check that you have specified the correct sep.species.")
  }
  message("OK - no duplicated spcodes found.")
  return(spcode)
}  # end spCodes function

[Created by Pretty R at inside-R.org]

If you have a vector named “species” with your species names, with the genus and the specific name separated by a white space, load the function above and then type:

spcode <- spCodes(species, nchar.gen = 3, nchar.sp = 3, nchar.ssp = 0, sep.species = " ", sep.spcode = "")

This will create species codes with the first 3 characters from the genus and the first 3 characters from the specific name (although you can set any other numbers of characters, such as 1 + 5). If the resulting spcodes are not unique (i.e., if the number of species names is not the same as the number of resulting species codes), you will get an error message showing the duplicates and asking you to use another combination of numbers of characters for the genus and specific name.

If your input species binomials are separated by a character other than a space (e.g. a dot or an underscore), you can specify this with the sep.species argument; but note that all species names must be separated by the same character (e.g., ONE white space) for the function to work correctly. By default, there will be no character separating the two parts of the spcode, but you can use the sep.spcode argument to define one (e.g., sep.spcode = “_”).

If you have a table called mydata with species names in columns 2 to 30, for example, you can turn them into species codes like this:

spcodes <- spCodes(names(mydata)[2:30], nchar.gen = 3, nchar.sp = 3, sep.species = " ", sep.spcode = "")

names(mydata)[2:30] <- spcodes


REFERENCES:

Barbosa A.M. (2014) fuzzySim: Fuzzy similarity in species’ distributions. R package, version 1.0.

Advertisements

Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s