model_pipeline.create_model_pipeline

model_pipeline.create_model_pipeline(
    X,
    numerical_feat=[],
    categorical_feat=[],
    model='lr',
)

Create model pipeline for model of choice (logistic regression, SVC or random forest) with standardisation for numerical features and one-hot encoding for categorical features. Any remaining features are passed through with no preprocessing.

Parameters

Name	Type	Description	Default
X	pd.DataFrame	Data without target column. Should not contain missing values.	required
numerical_feat	list	Names of numerical columns to be standardised.	`[]`
categorical_feat	list	Names of categorical columns to be one-hot encoded.	`[]`
model	(lr, svc, rf)	Model to include in pipeline. `lr`: `LogisticRegression` `svc`: `SVC` `rf`: `RandomForestClassifier`	`'lr'`

Returns

Name	Type	Description
	sklearn.pipeline.Pipeline	Unfitted pipeline with standardisation, one-hot encoding and specified model.

Raises

Name	Type	Description
	TypeError	If input types are wrong.
	ValueError	If model not in specified list or columns are not found in dataframe.

Examples

>>> pipeline = create_model_pipeline(X, ['age'], ['sex'], 'lr')
>>> pipeline.fit(X, y)
>>> predictions = pipeline.predict(X)