A function to generate outlier free dataset by imputing them with mean, median or trim entire row with outlier from dataset.
trim_outliers.Rd
A function to generate outlier free dataset by imputing them with mean, median or trim entire row with outlier from dataset.
Arguments
- dataframe
A target dataframe where the function is performed.
- columns
The target columns where the function needed to be performed. Default is None, the function will check all columns.
- identifier
The method of identifying outliers.
- method
The method of dealing with outliers. - if "trim" : remove completely rows with data points having outliers. - if "median" : replace outliers with median values - if "mean" : replace outliers with mean values
Examples
library(tidyverse)
df = as.data.frame(tibble(SepalLengthCm = c(5.1, 4.9, 4.7, 5.5, 5.1, 50, 54, 5.0, 5.2, 5.3, 5.1),
SepalWidthCm = c(1.4, 1.4, 20, 2.0, 0.7, 1.6, 1.2, 1.4, 1.8, 1.5, 2.1),
PetalWidthCm = c(0.2, 0.2, 0.2, 0.3, 0.4, 0.5, 0.5, 0.6, 0.4, 0.2, 5))
)
trim_outliers(df,identifier='Z_score', method='trim')
#> SepalLengthCm SepalWidthCm PetalWidthCm
#> 1 5.1 1.4 0.2
#> 2 4.9 1.4 0.2
#> 4 5.5 2.0 0.3
#> 5 5.1 0.7 0.4
#> 6 50.0 1.6 0.5
#> 7 54.0 1.2 0.5
#> 8 5.0 1.4 0.6
#> 9 5.2 1.8 0.4
#> 10 5.3 1.5 0.2
#> 11 5.1 2.1 5.0