# Analyse multicollinearity in a dataset, including the variance inflation factor (VIF)

The multicol function, now included in the fuzzySim package (Barbosa, 2015), calculates the degree of multicollinearity in a set of numeric variables, using three closely related measures: R-squared, tolerance and the variance inflation factor (VIF). The latter is being increasingly used in ecology and a range of other research areas, and several packages already calculate the VIF in R, but their results can be strikingly different (Estrada & Barbosa, submitted). The multicol function below calculates multicollinearity statistics directly on the independent variables, producing the expected results and without (so far) clashing with other packages. Only ‘independent’ (predictor, right hand side side) variables should be entered, as the result obtained for each variable depends on all the other variables present in the analysed data set.

```multicol <- function(vars) {
# version 0.3 (9 Oct 2014)
result <- matrix(NA, nrow = ncol(vars), ncol = 3)
rownames(result) <- colnames(vars)
colnames(result) <- c("Rsquared", "Tolerance", "VIF")
for (v in 1:ncol(vars)) {
v.name <- colnames(vars)[v]
other.v.names <- colnames(vars)[-v]
mod.formula <- as.formula(paste(v.name, "~", paste(other.v.names, collapse = "+")))
mod <- lm(mod.formula, data = vars)
R2 <- summary(mod) \$ r.squared
result[v, "Rsquared"] <- R2
result[v, "Tolerance"] <- 1 - R2
result[v, "VIF"] <- 1 / (1 - R2)
}
return(result)
}```

[presented with Pretty R]

Let’s try some examples with a couple of datasets available in R (trees and OrchardSprays):

```multicol(trees)

# you'll get a warning and some NA results if any of the variables is not numeric:
multicol(OrchardSprays)

# but you can define a subset of columns of 'data' to calculate 'multicol' for:
multicol(OrchardSprays[ , 1:3])```

REFERENCES

Barbosa A.M. (2015) fuzzySim: applying fuzzy logic to binary similarity indices in ecology. Methods in Ecology and Evolution, 6: 853-858

Estrada A. & Barbosa A.M. (submitted) Calculating the Variance Inflation Factor in R: cautions and recommendations.