missing_correlation_matrix.missing_correlation_matrix

missing_correlation_matrix.missing_correlation_matrix(df)

Calculate correlations between variables’ missingness patterns.

Each column in df, this function constructs a binary indicator (1 = value is missing, 0 = value is observed) and computes the pairwise correlation matrix of these indicators. High positive correlations flag variables that tend to be missing at the same time, which can reveal shared ‘data collection’ issues or common missingness mechanisms.

Parameters

Name Type Description Default
df pd.DataFrame A pandas DataFrame for which the pairwise correlations of missing values across columns should be computed. required

Returns

Name Type Description
pd.DataFrame A square DataFrame whose rows and columns correspond to the original variables, and whose entries give the correlation between their missingness indicators.

Examples

>>> df = pd.DataFrame({'age': [25, np.nan, 35], 'income': [50000, 60000, np.nan]})
>>> result = missing_correlation_matrix(df)
                age     income
    age             1.0         -0.5
    income      -0.5        1.0