Find bad apples

This function uses a univariate approach to outlier detection. For each column with outliers (values that are 2 or more standard deviations from the mean), this function will create a reference list of row indices with outliers, and the total number of outliers in that column.

Note: This function works best for small datasets with unimodal variable distributions.

find_bad_apples(df)

Arguments

df	A dataframe containing numeric data

Value

A dataframe with columns for 'variable' (dataframe column name), 'total_outliers' (number of outliers in the column), and 'indices' (list of row indices with outliers)

Examples

df <- data.frame('A' = c(1, 1, 1, 10, 1, 1, 1, 1, 1, 1,
                         1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
                         1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
                 'B' = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
                         1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
                         1, 1, 1, 1, 1, 1, 1, 1, 1, 10))

find_bad_apples(df)
#> # A tibble: 2 x 3
#> # Groups:   variable, total_outliers [2]
#>   variable total_outliers indices         
#>   <chr>             <dbl> <list>          
#> 1 A                     1 <tibble [1 × 1]>
#> 2 B                     1 <tibble [1 × 1]>

Arguments

Value

Examples

Contents