DRY out your regression analysis!
As Data Scientists, being able to perform Exploratory Data Analysis as well as Regression Analysis are paramount to the process of analyzing trends in data. Moreover, following the DRY (Do Not Repeat Yourself) principle is regarded as a majority priority for maximizing code quality. Yet, often times Data Scientists facing these tasks will start the entire process from scratch, wasting both time and effort while compromising code quality. The aRidanalysis package strives to remedy this problem by giving users an easy-to-implement EDA function alongside 3 functions to generate statistical model classes that will simplify these analytical processes and produce an easy to read interpretation of the input data. Users will no longer have to write many lines of code to explore their data effectively!
arid_eda
This function takes in the data frame of interest and generates summary statistics as well as basic exploratory data analysis plots to helps users understand the overall behaviour of the explanatory and response variables.
arid_linreg
This function takes in user specified linear regression model hyperparameters of interest and returns an arid_linreg class linear regression model with a sci-kit learn style interface. This model class has appropriate fit
, predict
, and score
methods to provide linear regression analysis with the model specified.
arid_logreg
This function takes in a data frame of input features, a response vector, and regression model parameters to perform either a binomial or multinomial classification and returns an arid_logreg class logistic regression model with a sci-kit learn style interface. This model class has appropriate fit
, predict
, and score
methods to provide logistic regression analysis with the model specified.
arid_countreg
This function takes a dataframe, its categorical and continuous variables and other user input parameters to return a fitted arid_countreg class Poisson count regression model with a sci-kit learn style interface along with important inferential statistics. This model class has appropriate fit
, predict
, and score
methods to provide Poisson count regression analysis with the model specified.
This package will build off the EDA and statistical analysis provided by ggplot2
included in the Tidyverse package as well as base R package to streamline data visualization and model analysis functionality. There are some existing packages that help you with this, however the aridanalysis
package aims to ease the job to provide different regression analysis interpretations.
The aRidanalysis package is not currently available on CRAN, but can be installed from GitHub using the following commands:
# install.packages("devtools")
devtools::install_github("UBC-MDS/aridanalysis")
dplyr,
tidyr,
glmnet,
palmerpenguins,
ggplot2,
GGally,
tidyverse,
MASS,
broom,
AER,
tibble,
rlang,
stringr,
magrittr
For usage examples please refer to the aRidanalysis vignette page.
Documentation files located on GitHub here.
These instructions are available during development after package installation through help(