Skip to contents

Returns outliers in the dataset based on values of a variable. This function identifies outliers in the dataset based on either of the following methods:

  1. Interquartile Range (IQR) Method: Identifies all values less than Q1 - 1.5 x IQR and greater than Q3 + 1.5 x IQR where IQR = Q3-Q1, are identified as outliers.

  2. Mean and Standard Deviation Method: Identifies all values less than mean - k times standard_deviation and greater than mean + k times standard_deviation as outliers.

Usage

soc_get_outliers(df, col, method = "SD", thresh = 3)

Arguments

df

Dataframe in which outliers are to be identified.

col

Column vector in the dataframe based on which outliers are to be identified

method

Name of the outlier identification method to be used. "IQR" for IQR method and "SD" for mean and standard deviation method

thresh

The value of k in the Mean and Standard Deviation Method formula above

Value

A dataframe which is subset of original dataframe containing only rows corresponding to outliers.ggpl

Examples

library(dplyr)
small_data <- data.frame(age = - c(18, 20, 20), Wages_Euros = c(300000, 575000, 150000))
soc_get_outliers(small_data,"Wages_Euros","SD",3)
#> Warning: argument is not numeric or logical: returning NA
#> Warning: NAs introduced by coercion
#> [1] age         Wages_Euros
#> <0 rows> (or 0-length row.names)