missing_values

missing_values

Functions

Name	Description
missing_values	This function fills missing values (NaN) in a pandas DataFrame using

missing_values

missing_values.missing_values(df, method='median')

This function fills missing values (NaN) in a pandas DataFrame using column-appropriate imputation strategies.

This function imputes missing values in both numeric and categorical columns. Numeric columns are filled using a user-specified method (mean, median, or mode), while categorical (non-numeric) columns are automatically filled using mode imputation.

Missing values can distort statistical analyses and machine learning models. This function provides common strategies for imputing missing values depending on the nature of the data distribution.

The function identifies numeric and non-numeric columns and applies imputation independently to each column.

Parameters

Name	Type	Description	Default
df	pd.DataFrame	The DataFrame containing missing values to be imputed.	required
method	str	The imputation method to use for numeric columns. Valid options are: - ‘mean’ : Replace NaN with column mean (suitable for symmetric data) - ‘median’ : Replace NaN with column median (robust to outliers) - ‘mode’ : Replace NaN with column mode Categorical (non-numeric) columns always use mode imputation regardless of the selected method.	`"median"`

Returns

Name	Type	Description
	(pd.DataFrame, float)	result_df : pd.DataFrame A DataFrame with missing values filled in both numeric and categorical (non-numeric) columns. filled_percentage : float The percentage of total DataFrame values that were originally missing and have been filled, calculated as: (number of filled values / number of total values) * 100. Columns containing only NaN values are left unchanged and do not contribute any filled values to this percentage.

Raises

Name	Type	Description
	TypeError	If df is not a pandas DataFrame.
	ValueError	If method is not one of the 3 supported numeric options.

Notes

Numeric columns are imputed using the specified method.
Categorical (non-numeric) columns are imputed using mode.
Imputation is applied column-wise.
Columns containing all NaN values are left unchanged and do not affect the filled percentage.
If multiple modes exist (for both numeric and categorical columns), the first mode returned by pandas is used.
The original DataFrame is not modified; a copy is returned.
The filled percentage includes values filled in both numeric and categorical (non-numeric) columns.

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({
...     'age': [25, 30, np.nan, 28],
...     'income': [50000, np.nan, 52000, np.nan],
...     'city': ['A', 'B', np.nan, 'B']
... })
>>> result_df, filled_percentage = missing_values(df, method='median')
>>> print(result_df)
    age   income city
0  25.0  50000.0    A
1  30.0  51000.0    B
2  28.0  52000.0    B
3  28.0  51000.0    B
>>> print(f"{filled_percentage:.1f}% of values were filled.")
33.3% of values were filled.