Select features using forward selection algorithm. It starts as an empty model, and add the variable with the highest improvement in the accuracy of the model. The process is iteratively repeated and it stops when the remaining variables doesn't improve the accuracy of the model.

forward_selection(scorer, X, y, min_features = 1, max_features = 10)

Arguments

scorer

A custom user-supplied function that accepts X and y (as defined below) as input and returns the error of the datasets.

X

tibble. training dataset

y

tibble. test dataset

min_features

double. number of minimum features to select

max_features

double. number of maximum features to select

Value

vector. The indexes of selected features

Examples

my_scorer <- function(data) { model <- lm(Y ~ ., data) return(mean(model$residuals^2)) } data <- dplyr::select(tgp::friedman.1.data(), -Ytrue) train_data <- data[1:(length(data)-1)] test_data <- data[length(data)] features <- featureselection::forward_selection(my_scorer, train_data, test_data, 3, 7) # [1] 4 2 1 5