A function that identify and summarize the count and range of based on the method the user choose

Usage

outlier_identifier(
  dataframe,
  columns = NULL,
  identifier = "IQR",
  return_df = FALSE
)

Arguments

dataframe: The target dataframe(data.frame) where the function is performed
columns: The target vector of columns where the function needed to be performed. Default is NULL, the function will check all columns
identifier: The method of identifying outliers.
return_df: Can be set to TRUE if want output as dataframe(data.frame) identified with outliers in rows

Value

A dataframe(data.frame) with the summary of the outlier identified by the method) if return_df = FALSE, A dataframe(data.frame) with additional column having if row has outlier or not) if return_df = TRUE

Examples

library(tidyverse)
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
#> ✔ ggplot2 3.3.5     ✔ purrr   0.3.4
#> ✔ tibble  3.1.6     ✔ dplyr   1.0.7
#> ✔ tidyr   1.2.0     ✔ stringr 1.4.0
#> ✔ readr   2.1.2     ✔ forcats 0.5.1
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()

df = data.frame(SepalLengthCm = c(5.1, 4.9, 4.7, 5.5, 5.1, 50, 54, 5.0, 5.2, 5.3, 5.1),
                          SepalWidthCm = c(1.4, 1.4, 20, 2.0, 0.7, 1.6, 1.2, 1.4, 1.8, 1.5, 2.1),
                          PetalWidthCm = c(0.2, 0.2, 0.2, 0.3, 0.4, 0.5, 0.5, 0.6, 0.4, 0.2, 5))


outlier_identifier(df)
#>                    SepalLengthCm SepalWidthCm PetalWidthCm
#> outlier_count                  2            1            1
#> outlier_percentage        18.18%        9.09%        9.09%
#> mean                       13.63         3.19         0.77
#> median                       5.1          1.5          0.4
#> std                        18.99         5.59         1.41
#> lower_range                 <NA>         <NA>         <NA>
#> upper_range              (50,54)           20            5