Select features based on feature selection

Select features using forward selection algorithm. It starts as an empty model, and add the variable with the highest improvement in the accuracy of the model. The process is iteratively repeated and it stops when the remaining variables doesn't improve the accuracy of the model.

forward_selection(scorer, X, y, min_features = 1, max_features = 10)

Arguments

scorer	A custom user-supplied function that accepts X and y (as defined below) as input and returns the error of the datasets.
X	tibble. training dataset
y	tibble. test dataset
min_features	double. number of minimum features to select
max_features	double. number of maximum features to select

Value

vector. The indexes of selected features

Examples

my_scorer <- function(data) {
  model <- lm(Y ~ ., data)
  return(mean(model$residuals^2))
}
data <- dplyr::select(tgp::friedman.1.data(), -Ytrue)
train_data <- data[1:(length(data)-1)]
test_data <- data[length(data)]
features <- featureselection::forward_selection(my_scorer, train_data, test_data, 3, 7)
# [1] 4 2 1 5

Arguments

Value

Examples

Contents