handle_missing

handle_missing

Functions

Name Description
handle_missing Handles missing data in a pandas DataFrame.

handle_missing

handle_missing.handle_missing(df, strategy='drop', columns=None)

Handles missing data in a pandas DataFrame.

Function returns a pandas DataFrame where missing values are handled in a user-defined way.

Parameters

Name Type Description Default
df pandas.DataFrame Input DataFrame required
strategy str The strategy to use for handling missing values. Permissible values (numeric): mean, median, max, min, mode, drop Permissible values (else): mode, drop 'drop'
columns list Columns where the missing values are to be handled. Default handles all columns. None

Returns

Name Type Description
pandas.DataFrame Dataframe where missing values have been handled.

Raises

Name Type Description
TypeError If df is not a pandas DataFrame. If strategy is not a string. If columns is not a list or None. If strategy cannot be used for dtype of column. If dtype of column is not designed to be handled.
ValueError If strategy is not permitted. If column is not in df.columns. If column only contains NaN.

Examples

>>> import numpy as np
>>> import pandas as pd
>>>df = pd.DataFrame({
...     "A": [1, 1, 2],
...     "B": [np.nan, 3, 4]
... })
>>> handle_missing(df)
   A  B
1  1  3
2  2  4
>>> handle_missing(df, strategy='mean')
   A  B
0  1  3.5
1  1  3.0
2  2  4.0