Data Splitting for Supervised Machine Learning — supervised

A function that utilizes tidymodels's initial_split function to perform data spltting while providing convenient access to X and y portions of both the test split and the train split.

supervised_data(data, xcols, ycols, ...)

Arguments

data	the original dataset to be used for splitting
xcols	a vector containing feature names (X) to be used as independent variables
ycols	a vector containing target names (y) to be used as dependent variables or labels
...	Additional parameters to pass to the `initial_split` function in `tidymodels`. See `tidymodels` documentation for more details

Value

A list of the following components.

data - The original dataset unchanged
train - The training portion of the dataset
test - The test portion of the dataset
xtrain - The training portion of the dataset containing X features only.
ytrain - The training portion of the dataset containing y targets only.
xtest - The test portion of the dataset containing X features only.
ytest - The test portion of the dataset containing y targets only.

Examples

set.seed(1353)
cars <- supervised_data(mtcars, xcols = c('mpg', 'cyl', 'disp'), ycols=c('hp'))
train_data <- cars$train
test_data <- cars$test
x_train <- cars$xtrain
y_train <- cars$ytrain
x_test <- cars$xtest
y_test <- cars$ytest