Find outliers in data.
soc_get_outliers.Rd
Returns outliers in the dataset based on values of a variable. This function identifies outliers in the dataset based on either of the following methods:
Interquartile Range (IQR) Method: Identifies all values less than Q1 - 1.5 x IQR and greater than Q3 + 1.5 x IQR where IQR = Q3-Q1, are identified as outliers.
Mean and Standard Deviation Method: Identifies all values less than mean - k times standard_deviation and greater than mean + k times standard_deviation as outliers.
Arguments
- df
Dataframe in which outliers are to be identified.
- col
Column vector in the dataframe based on which outliers are to be identified
- method
Name of the outlier identification method to be used. "IQR" for IQR method and "SD" for mean and standard deviation method
- thresh
The value of k in the Mean and Standard Deviation Method formula above
Value
A dataframe which is subset of original dataframe containing only rows corresponding to outliers.ggpl
Examples
library(dplyr)
small_data <- data.frame(age = - c(18, 20, 20), Wages_Euros = c(300000, 575000, 150000))
soc_get_outliers(small_data,"Wages_Euros","SD",3)
#> Warning: argument is not numeric or logical: returning NA
#> Warning: NAs introduced by coercion
#> [1] age Wages_Euros
#> <0 rows> (or 0-length row.names)