Identifies multicollinearity using Pearson's coefficient and suggests the variable with the highest VIF score out of the correlated pair for elimination.
col_identify.Rd
Returns a DataFrame containing Pearson's coefficient, VIF with explanatory variables suggested for elimination. An empty dataframe means no multicollinearity detected.
Arguments
- df
An input dataframe
- X
Explanatory variables (vector of characters)
- y
Response variable (single vector)
- corr_min
(optional) A decimal number that serves as a threshold for selecting a pair. This is a Pearson coefficient value. Default set at -0.8.
- corr_max
(optional) A decimal number that serves as a threshold for selecting a pair. This is a Pearson coefficient value. Default set at 0.8.
- vif_limit
(optional) A decimal number that serves as a threshold for selecting a pair. This is a VIF value. Default set at 4.
Examples
col_identify(iris, c("Sepal.Width", "Petal.Length"),
"Petal.Width", vif_limit = 0, corr_max = 0.3, corr_min = -0.3)
#> Joining, by = "variable1"
#> # A tibble: 2 x 5
#> # Groups: pair [1]
#> variable correlation rounded_corr pair vif_score
#> <chr> <dbl> <dbl> <list> <dbl>
#> 1 Sepal.Width -0.428 -0.43 <chr [2]> 1.22
#> 2 Petal.Length -0.428 -0.43 <chr [2]> 1.22