Select features using forward selection algorithm. It starts as an empty model, and add the variable with the highest improvement in the accuracy of the model. The process is iteratively repeated and it stops when the remaining variables doesn't improve the accuracy of the model.
forward_selection(scorer, X, y, min_features = 1, max_features = 10)
scorer | A custom user-supplied function that accepts X and y (as defined below) as input and returns the error of the datasets. |
---|---|
X | tibble. training dataset |
y | tibble. test dataset |
min_features | double. number of minimum features to select |
max_features | double. number of maximum features to select |
vector. The indexes of selected features
my_scorer <- function(data) { model <- lm(Y ~ ., data) return(mean(model$residuals^2)) } data <- dplyr::select(tgp::friedman.1.data(), -Ytrue) train_data <- data[1:(length(data)-1)] test_data <- data[length(data)] features <- featureselection::forward_selection(my_scorer, train_data, test_data, 3, 7) # [1] 4 2 1 5