Create model pipeline for model of choice (logistic regression, SVC or random forest) with standardisation for numerical features and one-hot encoding for categorical features. Any remaining features are passed through with no preprocessing.
Parameters
Name
Type
Description
Default
X
pd.DataFrame
Data without target column. Should not contain missing values.
required
numerical_feat
list
Names of numerical columns to be standardised.
[]
categorical_feat
list
Names of categorical columns to be one-hot encoded.
[]
model
(lr, svc, rf)
Model to include in pipeline. lr: LogisticRegressionsvc: SVCrf: RandomForestClassifier
'lr'
Returns
Name
Type
Description
sklearn.pipeline.Pipeline
Unfitted pipeline with standardisation, one-hot encoding and specified model.
Raises
Name
Type
Description
TypeError
If input types are wrong.
ValueError
If model not in specified list or columns are not found in dataframe.