Introduction to rb4model

library(rb4model)

This package aims to build an R package that elegantly performs data pre-processing in a fast and easy manner. With four separate functions that will come along with the rb4model package, users will have greater flexibility in handling many different types of datasets in the wild or those collected by them. With the rb4model package, users will be able to smoothly pre-process their data and have it ready for the machine learning model of their choice.

Handle Missing Values

This function will take in a dataframe and handle any missing values by either deleting the row, filling in the value with the average, or filling in the value with the last observation (the user will specify which method to use in the function argument).
This function will return a dataframe without missing values.

Here, we will replace missing values in the airquality dataset with the mean.

head(airquality)
#>   Ozone Solar.R Wind Temp Month Day
#> 1    41     190  7.4   67     5   1
#> 2    36     118  8.0   72     5   2
#> 3    12     149 12.6   74     5   3
#> 4    18     313 11.5   62     5   4
#> 5    NA      NA 14.3   56     5   5
#> 6    28      NA 14.9   66     5   6
head(missing_val(airquality, 'mean'))
#>      Ozone  Solar.R Wind Temp Month Day
#> 1 41.00000 190.0000  7.4   67     5   1
#> 2 36.00000 118.0000  8.0   72     5   2
#> 3 12.00000 149.0000 12.6   74     5   3
#> 4 18.00000 313.0000 11.5   62     5   4
#> 5 42.12931 185.9315 14.3   56     5   5
#> 6 28.00000 185.9315 14.9   66     5   6

Feature Spliter

This function will take in a dataframe and split the data into numerical and categorical features.
This function will return two lists, one list containing the names of the numerical features and one list containing the names of the categorical features.

Here, we will split the mtcars dataset into numerical and categorical featrues.

feature_splitter(mtcars)
#> [[1]]
#> character(0)
#> 
#> [[2]]
#>  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
#> [11] "carb"

Fit and Report

This function will take in data, fit a model, and calculate its training and validation scores.
This function will return the model’s training and validation scores.

Here we will fit the iris dataset to a general linear model and return its root mean squared error.

x1<- iris[1:2][1:100,]
x2<-iris[1:2][100:150,]
y1<- iris$Petal.Length[1:100]
y2<-iris$Petal.Length[100:150]
fit_and_report(x1,y1,x2,y2,'glm','regression')
#> Loading required package: lattice
#> Loading required package: ggplot2
#>                  RMSE 
#> 0.43755234 0.08478573

Forward Feature Selection

This function will take in data, fit a model, and perform forward feature selection.
This function will return a dataframe with only the selected features.

Here, we will perform forward feature selection on the iris dataset.

y <- iris$Species
x <- iris[c(1,2,3,4)]
ffs <- ForwardSelection(feature=x, label=y, my_mod="rf")
#> note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .
#> 
#> note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .
#> 
#> note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .
head(x[ffs])
#>   Sepal.Width
#> 1         3.5
#> 2         3.0
#> 3         3.2
#> 4         3.1
#> 5         3.6
#> 6         3.9

Handle Missing Values

Feature Spliter

Fit and Report

Forward Feature Selection

Contents