make_recipe Documentation

The make_recipe() function is used to quickly apply common data preprocessing techniques

make_recipe(
  X,
  y,
  recipe,
  splits_to_return = "train_test",
  random_seed = NULL,
  train_valid_prop = 0.8
)

Arguments

X	A dataframe containing training data, validation data, and testing data (should contain X and y).
y	The name of the response column (as a string, e.g. "response_variable").
recipe	A string specifying which recipe to apply to the data. See "The recipe parameter" section below for details.
splits_to_return	A string specifying how to split the data. "train_test" to return train and test splits, "train_test_valid" to return train, test, and validation data, "train" to return all data without splits.
random_seed	An integer. The random seed to set for splitting data to create reproducible results. By default NULL
train_valid_prop	A float. The proportion to split the data by. Should range between 0 to 1. By default = 0.8

Value

A list of dataframes e.g. list(X_train, X_valid, X_test, y_train, y_valid, y_test)

The recipe parameter

The following recipes are available currently to pass into the recipe parameter:

"ohe_and_standard_scaler" - Apply one hot encoding to categorical features and standard scaler to numeric features

More recipes are under development and will be released in future updates.

Examples

# apply "ohe_and_standard_scaler" on training and testing data
X_example <- dplyr::as_tibble(mtcars) %>%
  dplyr::mutate(
    carb = as.factor(carb),
    gear = as.factor(gear),
    vs = as.factor(vs),
    am = as.factor(am)
  )
y_example <- "gear"
make_recipe(X = X_example, y = y_example, recipe = "ohe_and_standard_scaler", splits_to_return = "train_test")
#> $X_train
#> # A tibble: 26 x 17
#>       mpg     cyl    disp     hp   drat      wt   qsec  vs_0  vs_1  am_0  am_1
#>     <dbl>   <dbl>   <dbl>  <dbl>  <dbl>   <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  0.161 -0.0863 -0.578  -0.555  0.730 -0.425  -0.523     1     0     0     1
#>  2  0.461 -1.21   -0.984  -0.799  0.624 -0.981   0.314     0     1     0     1
#>  3  0.228 -0.0863  0.186  -0.555 -0.999 -0.0848  0.751     0     1     1     0
#>  4 -0.221  1.04    0.982   0.377 -0.851  0.141  -0.523     1     0     1     0
#>  5 -0.321 -0.0863 -0.0712 -0.627 -1.67   0.161   1.16      0     1     1     0
#>  6  0.727 -1.21   -0.682  -1.24   0.287 -0.110   1.05      0     1     1     0
#>  7  0.461 -1.21   -0.728  -0.770  0.772 -0.150   2.57      0     1     1     0
#>  8 -0.138 -0.0863 -0.519  -0.369  0.772  0.141   0.151     0     1     1     0
#>  9 -0.371 -0.0863 -0.519  -0.369  0.772  0.141   0.467     0     1     1     0
#> 10 -0.604  1.04    0.325   0.449 -1.02   0.771  -0.323     1     0     1     0
#> # … with 16 more rows, and 6 more variables: carb_1 <dbl>, carb_2 <dbl>,
#> #   carb_3 <dbl>, carb_4 <dbl>, carb_6 <dbl>, carb_8 <dbl>
#> 
#> $X_valid
#> # A tibble: 0 x 0
#> 
#> $X_test
#> # A tibble: 6 x 17
#>      mpg     cyl   disp      hp   drat     wt     qsec  vs_0  vs_1  am_0  am_1
#>    <dbl>   <dbl>  <dbl>   <dbl>  <dbl>  <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  0.161 -0.0863 -0.578 -0.555   0.730 -0.681 -0.818       1     0     0     1
#> 2 -0.954  1.04    0.982  1.38   -0.725  0.271 -1.14        1     0     1     0
#> 3 -0.804  1.04    0.325  0.449  -1.02   0.481 -0.00688     1     0     1     0
#> 4  1.73  -1.21   -1.24  -1.39    2.90  -1.69   0.267       0     1     0     1
#> 5 -0.804  1.04    0.545  0.0188 -0.851  0.135 -0.375       1     0     1     0
#> 6  0.993 -1.21   -0.888 -0.828   1.85  -1.16  -0.691       1     0     0     1
#> # … with 6 more variables: carb_1 <dbl>, carb_2 <dbl>, carb_3 <dbl>,
#> #   carb_4 <dbl>, carb_6 <dbl>, carb_8 <dbl>
#> 
#> $y_train
#> # A tibble: 26 x 1
#>    gear 
#>    <fct>
#>  1 4    
#>  2 4    
#>  3 3    
#>  4 3    
#>  5 3    
#>  6 4    
#>  7 4    
#>  8 4    
#>  9 4    
#> 10 3    
#> # … with 16 more rows
#> 
#> $y_valid
#> # A tibble: 0 x 0
#> 
#> $y_test
#> # A tibble: 6 x 1
#>   gear 
#>   <fct>
#> 1 4    
#> 2 3    
#> 3 3    
#> 4 4    
#> 5 3    
#> 6 5    
#>

Arguments

Value

The recipe parameter

Examples

Contents