Analyse multicollinearity in a dataset, including the variance inflation factor (VIF)

The multicol function, now included in the fuzzySim package (Barbosa, 2015), calculates the degree of multicollinearity in a set of numeric variables, using three closely related measures: R-squared, tolerance and the variance inflation factor (VIF). The latter is being increasingly used in ecology and a range of other research areas, and several packages already calculate the VIF in R, but their results can be strikingly different (Estrada & Barbosa, submitted). The multicol function below calculates multicollinearity statistics directly on the independent variables, producing the expected results and without (so far) clashing with other packages. Only ‘independent’ (predictor, right hand side side) variables should be entered, as the result obtained for each variable depends on all the other variables present in the analysed data set.

multicol <- function(vars) {
  # version 0.3 (9 Oct 2014)
result <- matrix(NA, nrow = ncol(vars), ncol = 3)
  rownames(result) <- colnames(vars)
  colnames(result) <- c("Rsquared", "Tolerance", "VIF")
  for (v in 1:ncol(vars)) {
    v.name <- colnames(vars)[v]
    other.v.names <- colnames(vars)[-v]
    mod.formula <- as.formula(paste(v.name, "~", paste(other.v.names, collapse = "+")))
    mod <- lm(mod.formula, data = vars)
    R2 <- summary(mod) $ r.squared
    result[v, "Rsquared"] <- R2
    result[v, "Tolerance"] <- 1 - R2
    result[v, "VIF"] <- 1 / (1 - R2)
  }
  return(result)
}

[presented with Pretty R]

Let’s try some examples with a couple of datasets available in R (trees and OrchardSprays):

multicol(trees)
 
# you'll get a warning and some NA results if any of the variables is not numeric:
multicol(OrchardSprays)
 
# but you can define a subset of columns of 'data' to calculate 'multicol' for:
multicol(OrchardSprays[ , 1:3])

REFERENCES

Barbosa A.M. (2015) fuzzySim: applying fuzzy logic to binary similarity indices in ecology. Methods in Ecology and Evolution, 6: 853-858

Estrada A. & Barbosa A.M. (submitted) Calculating the Variance Inflation Factor in R: cautions and recommendations.

Advertisements

Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s