Analyzes the values of a given column list in a given dataframe, identifies outliers using either the Z-Score algorithm or interquantile range algorithm. The return is a dataframe containing the following columns: column name, list containing the outlier's index position, percentaje of total counts considered outliers. Modifies an existing dataframe, with missing values imputed based on the chosen method.

fast_outlier_id(
  data,
  cols = "All",
  method = "z-score",
  threshold_low_freq = 0.05
)

Arguments

data	dataframe - Dataframe to be analyzed
cols	list - List containing the columns to be analyzed.
method	string - string indicating which method to be used to identify outliers (methods available are: "Z score" or "Interquantile Range")
threshold_low_freq	double - Indicates the threshold for evaluating outliers in categorical columns.

Value

dataframe

Examples

fast_outlier_id(data = iris, cols =  c("Sepal.Length", "Sepal.Width"), method = "z-score")
#> Warning: the condition has length > 1 and only the first element will be used
#> Warning: the condition has length > 1 and only the first element will be used
#> # A tibble: 2 x 8
#>   column_name type  no_nans perc_nans outlier_method no_outliers perc_outliers
#>   <chr>       <chr>   <int>     <dbl> <list>               <int> <list>       
#> 1 Sepal.Leng~ nume~       0         0 <chr [1]>                6 <dbl [1]>    
#> 2 Sepal.Width nume~       0         0 <chr [1]>                5 <dbl [1]>    
#> # ... with 1 more variable: outlier_values <list>

Arguments

Value

Examples

Contents