This function uses a univariate approach to outlier detection. For each column with outliers (values that are 2 or more standard deviations from the mean), this function will create a reference list of row indices with outliers, and the total number of outliers in that column.

Note: This function works best for small datasets with unimodal variable distributions.

find_bad_apples(df)

Arguments

df

A dataframe containing numeric data

Value

A dataframe with columns for 'variable' (dataframe column name), 'total_outliers' (number of outliers in the column), and 'indices' (list of row indices with outliers)

Examples

df <- data.frame('A' = c(1, 1, 1, 10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), 'B' = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 10)) find_bad_apples(df)
#> # A tibble: 2 x 3 #> # Groups: variable, total_outliers [2] #> variable total_outliers indices #> <chr> <dbl> <list> #> 1 A 1 <tibble [1 × 1]> #> 2 B 1 <tibble [1 × 1]>