Skip to contents

Returns a DataFrame containing Pearson's coefficient, VIF with explanatory variables suggested for elimination. An empty dataframe means no multicollinearity detected.

Usage

col_identify(df, X, y, corr_min = -0.8, corr_max = 0.8, vif_limit = 4)

Arguments

df

An input dataframe

X

Explanatory variables (vector of characters)

y

Response variable (single vector)

corr_min

(optional) A decimal number that serves as a threshold for selecting a pair. This is a Pearson coefficient value. Default set at -0.8.

corr_max

(optional) A decimal number that serves as a threshold for selecting a pair. This is a Pearson coefficient value. Default set at 0.8.

vif_limit

(optional) A decimal number that serves as a threshold for selecting a pair. This is a VIF value. Default set at 4.

Examples

col_identify(iris, c("Sepal.Width", "Petal.Length"),
             "Petal.Width", vif_limit = 0, corr_max = 0.3, corr_min = -0.3)
#> Joining, by = "variable1"
#> # A tibble: 2 x 5
#> # Groups:   pair [1]
#>   variable     correlation rounded_corr pair      vif_score
#>   <chr>              <dbl>        <dbl> <list>        <dbl>
#> 1 Sepal.Width       -0.428        -0.43 <chr [2]>      1.22
#> 2 Petal.Length      -0.428        -0.43 <chr [2]>      1.22