Skip to contents

A function to generate outlier free dataset by imputing them with mean, median or trim entire row with outlier from dataset.

Usage

trim_outliers(dataframe, columns = NULL, identifier = "IQR", method = "trim")

Arguments

dataframe

A target dataframe where the function is performed.

columns

The target columns where the function needed to be performed. Default is None, the function will check all columns.

identifier

The method of identifying outliers.

method

The method of dealing with outliers. - if "trim" : remove completely rows with data points having outliers. - if "median" : replace outliers with median values - if "mean" : replace outliers with mean values

Value

a dataframe with the summary of the outlier identified by the method

Examples

library(tidyverse)

df = as.data.frame(tibble(SepalLengthCm = c(5.1, 4.9, 4.7, 5.5, 5.1, 50, 54, 5.0, 5.2, 5.3, 5.1),
                          SepalWidthCm = c(1.4, 1.4, 20, 2.0, 0.7, 1.6, 1.2, 1.4, 1.8, 1.5, 2.1),
                          PetalWidthCm = c(0.2, 0.2, 0.2, 0.3, 0.4, 0.5, 0.5, 0.6, 0.4, 0.2, 5))
                  )

trim_outliers(df,identifier='Z_score', method='trim')
#>    SepalLengthCm SepalWidthCm PetalWidthCm
#> 1            5.1          1.4          0.2
#> 2            4.9          1.4          0.2
#> 4            5.5          2.0          0.3
#> 5            5.1          0.7          0.4
#> 6           50.0          1.6          0.5
#> 7           54.0          1.2          0.5
#> 8            5.0          1.4          0.6
#> 9            5.2          1.8          0.4
#> 10           5.3          1.5          0.2
#> 11           5.1          2.1          5.0