Plot outliers and their values

The ‘plot_outliers‘ function below draws a boxplot and a scatterplot of a numeric variable x and plots the values of the outliers (currently not offset, even if they overlap). For relatively small datasets, it can be a quick way to identify which outliers look reasonable and which are likely a result of transcription or measurement error, and thus should be either corrected or discarded.

plot_outliers <- function(x, val_col = "blue", ...) { 
  par_in <- par(no.readonly = TRUE) 
  par(mfrow = c(1, 2)) 
  bp <- boxplot(x, ...) 
  out <- bp$out 
  message(length(out), " outliers detected") 
  if (length(out) > 0) text(x = 0.5, y = bp$out, labels = round(out, 2), adj = 0, col = val_col)
  plot(x, pch = 20)
  if (length(out) > 0) text(x = 0.5, y = bp$out, labels = round(out, 2), adj = 0, col = val_col)
  par(par_in)
}

Usage examples:

plot_outliers(iris$Sepal.Width)

Additional arguments for the ‘boxplot‘ function can be provided, e.g.

plot_outliers(airquality$Ozone, notch = TRUE)
plot_outliers(airquality$Wind, col = "darkgreen", main = "wind")

This function is used in an article which we hope to submit soon.

2 thoughts on “Plot outliers and their values

  1. Pingback: Plot outliers and their values | R-bloggers

  2. Pingback: Plot outliers and their values – Data Science Austria

Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s