R/fast_outliers.R
fast_outlier_id.Rd
Analyzes the values of a given column list in a given dataframe, identifies outliers using either the Z-Score algorithm or interquantile range algorithm. The return is a dataframe containing the following columns: column name, list containing the outlier's index position, percentaje of total counts considered outliers. Modifies an existing dataframe, with missing values imputed based on the chosen method.
fast_outlier_id( data, cols = "All", method = "z-score", threshold_low_freq = 0.05 )
data | dataframe - Dataframe to be analyzed |
---|---|
cols | list - List containing the columns to be analyzed. |
method | string - string indicating which method to be used to identify outliers (methods available are: "Z score" or "Interquantile Range") |
threshold_low_freq | double - Indicates the threshold for evaluating outliers in categorical columns. |
dataframe
#> Warning: the condition has length > 1 and only the first element will be used#> Warning: the condition has length > 1 and only the first element will be used#> # A tibble: 2 x 8 #> column_name type no_nans perc_nans outlier_method no_outliers perc_outliers #> <chr> <chr> <int> <dbl> <list> <int> <list> #> 1 Sepal.Leng~ nume~ 0 0 <chr [1]> 6 <dbl [1]> #> 2 Sepal.Width nume~ 0 0 <chr [1]> 5 <dbl [1]> #> # ... with 1 more variable: outlier_values <list>