It is common in ecology to search for statistical relationships between species’ occurrence and a set of predictor variables. However, when a large number of variables is analysed (compared to the number of observations), false findings may arise due to repeated testing. García (2003) recommended controlling the false discovery rate (FDR; Benjamini and Hochberg 1995) in ecological studies. Benjamini & Yekutieli (2001) adapted it to allow for dependency among tests statstics. The *p.adjust* function in R performs this and other corrections to the significance (*p*) values of variables under repeated testing. The *FDR* function, now included in the *fuzzySim* package (Barbosa, 2015), performs bivariate regressions (either linear or binary logistic) of a set of predictor variables on one response variable, or uses already-obtained *p* values for a set of predictor variables. It then calculates the FDR with *p.adjust*, and shows which variables should be retained for or excluded from further multivariate analysis according to their corrected *p* values (see, for example, Barbosa, Real & Vargas 2009). **Besides the p-values**, this function also reports the individual **AIC and BIC** of each variable.

The FDR function uses the Benjamini & Hochberg (“BH”) correction by default, but check the documentation of the *p.adjust* function for other available methods, namely “BY”, which allows for non-independent data (Benjamini & Yekutieli, 2001) and which is the default e.g. in function *multGLM*. Input data may be a matrix or data frame containing the response variable (for example, the presence-absence or abundance of a species) and the predictor variables; alternatively, you may already have performed the bivariate regressions and have a set of variables and corresponding *p* values which you want to correct with FDR — in this case, get a data frame with your variables’ names in the first column and their *p* values in the second column, and supply it as the *pvalues* argument to the FDR function (no need to provide *data*,* sp.cols* or* var.cols* in this case). **NOTE** that the **input data format for this function has changed** in mid May 2014, with former parameters *response* and *predictors* now replaced with *data*, *sp.cols* and *var.cols* (for coherence and compatibility with *multGLM*). Note also that, as of late October 2015, **argument model.type is deprecated**, as this information is included in argument

*family*– e.g., if you want

**linear models (LM)**, you can set

**. You can let the default**

*family = ‘gaussian’***figure out the family automatically.**

*family = ‘auto’*function (data = NULL, sp.cols = NULL, var.cols = NULL, pvalues = NULL, model.type = NULL, family = "auto", correction = "fdr", q = 0.05, verbose = TRUE, simplif = FALSE) { # version 3.6 (26 Apr 2016) if (length(sp.cols) > 1) stop("Sorry, FDR is currently implemented for only one response variable at a time, so 'sp.cols' must indicate only one column") if (!is.null(model.type)) warning("Argument 'model.type' is deprecated and now ignored, as this info is included in 'family' (e.g. 'gaussian' for LM, 'binomial' or 'poisson' for GLM).") model.type <- "GLM" n.init <- nrow(data) data <- data[is.finite(data[, sp.cols]), ] na.loss <- n.init - nrow(data) if (na.loss > 0) message(na.loss, " cases excluded due to missing or non-finite values.") if (family == "auto") { if (all(data[, sp.cols] %in% c(0, 1))) family <- "binomial" else if (all(data[, sp.cols] >= 0 && data[, sp.cols]%%1 == 0)) family <- "poisson" else family <- "gaussian" if (verbose) message("\nUsing generalized linear models of family '", family, "'.\n") } if (!(correction %in% p.adjust.methods)) stop("Invalid correction method.\nType 'p.adjust.methods' for available options.") response <- data[, sp.cols] predictors <- data[, var.cols] if (!is.null(pvalues)) { coeffs <- aic <- bic <- FALSE p.bivar <- pvalues[, 2] names(p.bivar) <- pvalues[, 1] } else { coeffs <- aic <- bic <- TRUE if (is.null(ncol(predictors))) stop("You need more than one predictor to calculate the FDR.") p.bivar <- coef.bivar <- aic.bivar <- bic.bivar <- vector("numeric", length = ncol(predictors)) for (i in 1:length(p.bivar)) { model <- glm(response ~ predictors[, i], family = family) p.bivar[i] <- anova(model, test = "Chi")[, "Pr(>Chi)"][2] coef.bivar[i] <- model[["coefficients"]][2] aic.bivar[i] <- extractAIC(model, k = 2)[2] bic.bivar[i] <- extractAIC(model, k = log(nobs(model)))[2] if (is.na(p.bivar[i])) message("A p-value could not be calculated for var.col number", i) if (is.na(aic.bivar[i])) message("AIC could not be calculated for var.col number", i) if (is.na(aic.bivar[i])) message("BIC could not be calculated for var.col number", i) } rm(i) } if (coeffs) { results <- data.frame(cbind(coef.bivar, aic.bivar, bic.bivar, p.bivar), row.names = names(predictors)) names(results) <- c("bivariate.coeff", "AIC", "BIC", "p.value") results <- results[order(results[, "p.value"]), ] results[, "p.adjusted"] <- p.adjust(results[, "p.value"], method = correction) } else { results <- data.frame(AIC = aic.bivar, BIC = bic.bivar, p.value = p.bivar, row.names = pvalues[, 1]) results <- results[order(results[, "p.value"]), ] results[, "p.adjusted"] <- p.adjust(results[, "p.value"], method = correction) } p.adjusted <- NULL if (simplif) return(results) exclude <- subset(results, p.adjusted > q) select <- subset(results, p.adjusted <= q) if (verbose) { message("Bivariate p-values adjusted with '", correction, "' correction;\n", nrow(exclude), " variable(s) excluded, ", nrow(select), " selected (with q = ", q, ")\n") } list(exclude = exclude, select = select) }

[presented with Pretty R]

**Usage example:**

FDR(data = mydata, sp.cols= 2, var.cols = 5:17)

FDR(data = mydata, sp.cols= 2, var.cols = 5:17, correction = “BY”)

(install *fuzzySim* and get help file *?FDR* for reproducible examples with sample data)

**References:**

Barbosa A.M. (2015) fuzzySim: applying fuzzy logic to binary similarity indices in ecology. *Methods in Ecology and Evolution*, 6: 853-858

Barbosa A.M., Real R. & Vargas J.M (2009) Transferability of environmental favourability models in geographic space: The case of the Iberian desman (*Galemys pyrenaicus*) in Portugal and Spain. *Ecological Modelling* 220: 747-754

Benjamini Y. & Hochberg Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. *Journal of the Royal Statistical Society, Series B* 57: 289-300

Benjamini Y. & Yekutieli D. (2001) The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 29: 1165-1188.

García L.V. (2003) Controlling the false discovery rate in ecological research. *Trends in Ecology and Evolution* 18: 553-554