suggest_imputation.suggest_imputation

suggest_imputation.suggest_imputation(df, missingness_type=None)

Suggest a single imputation strategy for handling missing data in a DataFrame.

Parameters

Name Type Description Default
df pd.DataFrame The input DataFrame containing missing values to analyze. missingness_type : str, optional (default=None) The type of missingness mechanism present in the data. Valid values: - ‘MCAR’: Missing Completely At Random - ‘MAR’: Missing At Random - ‘MAR’: Missing Not At Random If None, recommendation is based only on data characteristics. required

Returns

Name Type Description
dict A dictionary containing the imputation recommendation with the following keys: - ‘method’ (str): Recommended imputation method (e.g., ‘SimpleImputer’, ‘KNNImputer’, ‘IterativeImputer (MICE)’, ‘interpolation’, or ‘none’) - ‘reasoning’ (list of str): Explanation of factors that influenced the recommendation - ‘warnings’ (list of str): Important caveats or concerns about the data

Examples

>>> df = pd.DataFrame({'age': [25, np.nan, 35], 'income': [50000, 60000, np.nan]})
>>> result = suggest_imputation(df, missingness_type='MAR')
>>> print(result['method'])
'KNNImputer (k=5)'

Notes

  • For empty DataFrames or invalid inputs, returns method=‘none’ or method=‘error’
  • High missingness (>30%) may result in warnings about reliability