A function that utilizes tidymodels's initial_split function to perform data spltting while providing convenient access to X and y portions of both the test split and the train split.

supervised_data(data, xcols, ycols, ...)

Arguments

data

the original dataset to be used for splitting

xcols

a vector containing feature names (X) to be used as independent variables

ycols

a vector containing target names (y) to be used as dependent variables or labels

...

Additional parameters to pass to the initial_split function in tidymodels. See tidymodels documentation for more details

Value

A list of the following components.

  • data - The original dataset unchanged

  • train - The training portion of the dataset

  • test - The test portion of the dataset

  • xtrain - The training portion of the dataset containing X features only.

  • ytrain - The training portion of the dataset containing y targets only.

  • xtest - The test portion of the dataset containing X features only.

  • ytest - The test portion of the dataset containing y targets only.

Examples

set.seed(1353) cars <- supervised_data(mtcars, xcols = c('mpg', 'cyl', 'disp'), ycols=c('hp')) train_data <- cars$train test_data <- cars$test x_train <- cars$xtrain y_train <- cars$ytrain x_test <- cars$xtest y_test <- cars$ytest