model_comparison.model_comparison

model_comparison.model_comparison(
    models,
    X,
    y,
    metric='accuracy',
    greater_is_better=False,
)

Compare multiple fitted scikit-learn models and return the best-performing one.

Models are evaluated on the same dataset using a user-specified evaluation metric. The model with the highest score is returned.

Parameters

Name Type Description Default
models list of sklearn.base.BaseEstimator A list of fitted scikit-learn model objects that implement the predict method. required
X pandas.DataFrame or array - like Feature matrix used for evaluation. required
y pandas.Series or array - like True target values. required
metric str Evaluation metric used for comparison. Must be a valid scikit-learn classification metric (e.g. “accuracy”, “f1”, “precision”, “recall”). "accuracy"
greater_is_better Ensures proper comparison is performed for our chosen metric If False, error metric, lower error is better If True, accuracy measure. Higher accuracy is better. False

Returns

Name Type Description
sklearn.base.BaseEstimator The model with the best performance according to the selected evaluation metric.

Raises

Name Type Description
ValueError If the metric is not supported or if models is empty or not a valid sklearn object.

Examples

>>> from sklearn.datasets import make_classification
>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.tree import DecisionTreeClassifier
>>>X, y = make_classification(
n_samples=200,
n_features=5,
n_informative=3,
n_redundant=0,
n_classes=2,
random_state=42
)
>>> models = [LogisticRegression().fit(X, y),
...           DecisionTreeClassifier().fit(X, y)]
>>> best_model = model_comparison(models, X, y, metric="accuracy")