A function that identify and summarize the count and range of based on the method the user choose
outlier_identifier.Rd
A function that identify and summarize the count and range of based on the method the user choose
Arguments
- dataframe
The target dataframe(data.frame) where the function is performed
- columns
The target vector of columns where the function needed to be performed. Default is NULL, the function will check all columns
- identifier
The method of identifying outliers.
- return_df
Can be set to TRUE if want output as dataframe(data.frame) identified with outliers in rows
Value
A dataframe(data.frame) with the summary of the outlier identified by the method) if return_df = FALSE, A dataframe(data.frame) with additional column having if row has outlier or not) if return_df = TRUE
Examples
library(tidyverse)
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
#> ✔ ggplot2 3.3.5 ✔ purrr 0.3.4
#> ✔ tibble 3.1.6 ✔ dplyr 1.0.7
#> ✔ tidyr 1.2.0 ✔ stringr 1.4.0
#> ✔ readr 2.1.2 ✔ forcats 0.5.1
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
df = data.frame(SepalLengthCm = c(5.1, 4.9, 4.7, 5.5, 5.1, 50, 54, 5.0, 5.2, 5.3, 5.1),
SepalWidthCm = c(1.4, 1.4, 20, 2.0, 0.7, 1.6, 1.2, 1.4, 1.8, 1.5, 2.1),
PetalWidthCm = c(0.2, 0.2, 0.2, 0.3, 0.4, 0.5, 0.5, 0.6, 0.4, 0.2, 5))
outlier_identifier(df)
#> SepalLengthCm SepalWidthCm PetalWidthCm
#> outlier_count 2 1 1
#> outlier_percentage 18.18% 9.09% 9.09%
#> mean 13.63 3.19 0.77
#> median 5.1 1.5 0.4
#> std 18.99 5.59 1.41
#> lower_range <NA> <NA> <NA>
#> upper_range (50,54) 20 5