Analyzes the values of a given column list in a given dataframe, identifies outliers using either the Z-Score algorithm or interquantile range algorithm. The return is a dataframe containing the following columns: column name, list containing the outlier's index position, percentaje of total counts considered outliers. Modifies an existing dataframe, with missing values imputed based on the chosen method.

fast_outlier_id(
  data,
  cols = "All",
  method = "z-score",
  threshold_low_freq = 0.05
)

Arguments

data

dataframe - Dataframe to be analyzed

cols

list - List containing the columns to be analyzed.

method

string - string indicating which method to be used to identify outliers (methods available are: "Z score" or "Interquantile Range")

threshold_low_freq

double - Indicates the threshold for evaluating outliers in categorical columns.

Value

dataframe

Examples

fast_outlier_id(data = iris, cols = c("Sepal.Length", "Sepal.Width"), method = "z-score")
#> Warning: the condition has length > 1 and only the first element will be used
#> Warning: the condition has length > 1 and only the first element will be used
#> # A tibble: 2 x 8 #> column_name type no_nans perc_nans outlier_method no_outliers perc_outliers #> <chr> <chr> <int> <dbl> <list> <int> <list> #> 1 Sepal.Leng~ nume~ 0 0 <chr [1]> 6 <dbl [1]> #> 2 Sepal.Width nume~ 0 0 <chr [1]> 5 <dbl [1]> #> # ... with 1 more variable: outlier_values <list>