# Variation partitioning

The varPart function, now included in the modEvA package (Barbosa et al. 2014), performs variation partitioning (Borcard et al. 1992) among two factors (e.g. Ribas et al. 2006) or three factors (e.g. Real et al. 2003) for either multiple linear regression models (LM) or generalized linear models (GLM).

If you have linear models, input data for varPart are the coefficients of determination (R-squared values) of the linear regressions of the target variable on all the variables in the model, on the variables related to each particular factor, and (when there are 3 factors) on the variables related to each pair of factors. The outputs are the amounts of variance explained exclusively by each factor, the amounts explained exclusively by the overlapping effects of each pair of factors, and the amount explained by the overlap of the 3 factors if this is the case (e.g. Real et al. 2003). The amount of variation not explained by the complete model is also provided.

If you have generalized linear models (GLMs) such as logistic regression or the favourability function, you have no true R-squared values from these models; inputs can be squared coefficients of (Spearman’s) correlation between the model predictions given by each factor or pair of factors and the predictions of the complete model (e.g. Muñoz & Real 2006), or the R-squared values of the corresponding logit (y) functions (Real et al. 2013), or an adjusted R-squared (De Araújo et al. 2014). Consequently, output values are not the total amounts of variance (of the target variable) explained by factors and overlaps, but rather their proportional contribution to the total variation explained by the model. The amount of unexplained variation is not available for GLMs.

```varPart <- function(A, B, C = NA, AB, AC = NA, BC = NA, ABC = NA,
model.type = NULL, A.name = "Factor A", B.name = "Factor B",
C.name = "Factor C", plot = TRUE, plot.digits = 3, cex.names = 1.5,
cex.values = 1.2, main = "", cex.main = 2, plot.unexpl = TRUE) {

# version 1.7 (17 Mar 2017)

if (!is.null(model.type)) message ("NOTE: Argument 'model.type' is no longer used.")

partials c(A, B, C, AB, BC, AC, ABC)
if (is.finite(partials[c(1:2, 4)]) && is.na(partials[c(3, 5:7)]))
twofactors TRUE
else if (all(is.finite(partials)))
twofactors FALSE
else stop ("You must provide numeric values for either A, B and AB (for variation partitioning among two factors) or A, B, C, AB, BC, AC and ABC (for variation partitioning among three factors). See Details.")

if (!all(na.omit(partials) >= 0 & na.omit(partials) <= 1)) stop ("Values must be between 0 and 1.")

totalexpl ifelse(twofactors, AB, ABC)
unexpl 1 - totalexpl

if (twofactors) {
Apure c(paste("Pure", A.name), paste("Pure", B.name),
paste0("Pure ", A.name, "_", B.name, " overlap"),
"Unexplained")
results data.frame(c(Apure, Bpure, ABoverlap, unexpl),
row.names = output.names)

} else { # end if 2 factors

Apure C
BCoverlap c(paste("Pure", A.name),
paste("Pure", B.name),
paste("Pure", C.name),
paste0("Pure ", A.name, "_", B.name, " overlap"),
paste0("Pure ", B.name, "_", C.name, " overlap"),
paste0("Pure ", A.name,"_", C.name," overlap"),
paste0(A.name,"_",B.name,"_",C.name," overlap"),
"Unexplained")
results data.frame(c(Apure, Bpure, Cpure, ABoverlap, BCoverlap,
ACoverlap, ABCoverlap, unexpl),
row.names = output.names)
}  # end else

colnames(results) "Proportion"

if (plot) {  # adapted from Daniel's http://stackoverflow.com/questions/1428946/venn-diagrams-with-r

circle function(x, y, r) {
ang seq(0, 2*pi, length = 100)
xx cos(ang)
yy sin(ang)
polygon(xx, yy)
}  # end circle funtion (by Daniel)

Apure round(Apure, plot.digits)  # shorten values for plotting
Bpure round(Bpure, plot.digits)
ABoverlap round(ABoverlap, plot.digits)
if(!twofactors) {
Cpure round(Cpure, plot.digits)
BCoverlap round(BCoverlap, plot.digits)
ACoverlap round(ACoverlap, plot.digits)
ABCoverlap round(ABCoverlap, plot.digits)
}

if (twofactors) {
plot(0, 0, ylim = c(-1, 10), xlim = c(-1, 7), type = "n", axes = FALSE,
ylab = "", xlab = "", main = main, cex.main = cex.main)
circle(3,3,3)
circle(3,6,3)
text(x = c(3, 3), y = c(9.5, -0.5), labels = c(A.name, B.name),
cex = cex.names)
text(x = c(3, 3, 3), y = c(7, 4.75, 2), c(Apure, ABoverlap, Bpure),
cex = cex.values)

} else { # end if 2 factors

plot(0, 0, ylim = c(-1, 10), xlim = c(-1, 10), type = "n", axes = FALSE,
ylab = "", xlab = "", main = main, cex.main = cex.main)
circle(3, 6, 3)
circle(6, 6, 3)
circle(4.5, 3,  3)
text(x = c(2.5, 6.5, 4.5), y = c(9.5, 9.5, -0.5),
labels = c(A.name, B.name, C.name), cex = cex.names, adj = c(0.5, 0.5, 0))
text(x = c(1.8, 7.2, 4.5, 4.5, 2.8, 6.2, 4.5), y = c(6.6, 6.6, 2, 7, 4, 4, 5), labels = c(Apure, Bpure, Cpure, ABoverlap, ACoverlap, BCoverlap, ABCoverlap), cex = cex.values)
} # end if 2 factors else

if (plot.unexpl)  {
rect(-1, -1, 10, 10)
text(x = -0.9, y = -0.2, label = paste0("Unexplained\n", round(unexpl, plot.digits)), adj = 0, cex = cex.values)
}

}  # end if plot
results
}```

[presented with Pretty R]

You just need to copy the function above, paste it into R and press enter. An example of how to use it then is provided below. The results are returned numerically and also plotted in a diagram:

```# if you have a linear model (LM), use (non-adjusted) R-squared values
# for each factor and for their combinations as inputs:

varPart(A = 0.456, B = 0.315, C = 0.281, AB = 0.051, BC = 0.444,
AC = 0.569, ABC = 0.624, A.name = "Spatial", B.name = "Human",
C.name = "Environmental", main = "Small whale")

# if you have a generalized linear model (GLM),
# you can use squared correlation coefficients
# of the predictions of each factor with those of the complete model:

varPart(A = (-0.005)^2, B = 0.698^2, C = 0.922^2, AB = 0.696^2,
BC = 0.994^2, AC = 0.953^2, ABC = 1, A.name = "Topographic",
B.name = "Climatic", C.name = "Geographic", main = "Big bird",
plot.unexpl = FALSE)```

You can change the number of digits in the plotted values with the argument plot.digits. You can also change the character sizes of the plot’s main title, circle names, and values, using arguments cex.main, cex.names and cex.values. Note that these results derive from arithmetic operations between your input values, and they always sum up to 1; if your input is incorrect, the results will be incorrect as well.

References

Barbosa A.M., Brown J.A., Jiménez-Valverde & Real R. (2014) modEvA – an R package for model evaluation and analysis. R package, version 0.1.

Borcard D, Legendre P, Drapeau P (1992) Partialling out the spatial component of ecological variation. Ecology 73: 1045-1055

De Araújo, C.B., Marcondes-Machado, L.O. & Costa, G.C. (2014) The importance of biotic interactions in species distribution models: a test of the Eltonian noise hypothesis using parrots. Journal of Biogeography 41: 513-523 (DOI: 10.1111/jbi.12234)

Muñoz, A.-R. & Real, R. (2006) Assessing the potential range expansion of the exotic monk parakeet in Spain. Diversity and Distributions 12: 656–665.

Real, R., Barbosa, A.M., Porras, D., Kin, M.S., Márquez, A.L., Guerrero, J.C., Palomo, L.J., Justo, E.R. & Vargas, J.M. (2003) Relative importance of environment, human activity and spatial situation in determining the distribution of terrestrial mammal diversity in Argentina. Journal of Biogeography 30: 939-947.

Real R., Romero D., Olivero J., Estrada A. & Márquez A.L. (2013) Estimating how inflated or obscured effects of climate affect forecasted species distribution. PLoS ONE 8: e53646. doi:10.1371/journal.pone.0053646

Ribas, A., Barbosa, A.M., Casanova, J.C., Real, R., Feliu, C. & Vargas, J.M. (2006) Geographical patterns of the species richness of helminth parasites of moles (Talpa spp.) in Spain: separating the effect of sampling effort from those of other conditioning factors. Vie et Milieu 56: 1-8.