eda.eda

eda.eda(X, y)

Perform exploratory data analysis on a single numeric column of a DataFrame.

This function computes descriptive statistics for a specified column and generates a histogram to visualize its distribution. It is written defensively and will raise informative errors when invalid inputs or unsupported data types are provided.

Parameters

Name Type Description Default
X pandas.DataFrame Input DataFrame containing the column to be analyzed. Typically the target column required
y str Name of the column in X for which summary statistics and a histogram will be generated. The column must exist in X and contain numeric values. required

Returns

Name Type Description
summary_stats pandas.Series Descriptive statistics for column y, as returned by pandas.Series.describe.
histogram matplotlib.axes.Axes Matplotlib Axes object containing the histogram of column y.

Raises

Name Type Description
TypeError If X is not a pandas DataFrame. If y is not a string. If column y is not numeric.
KeyError If column y does not exist in X.
ValueError If column y is empty. If column y contains only missing values (NaNs).

Notes

This function creates a matplotlib plot but does not display it. To render the histogram, call matplotlib.pyplot.show() after invoking this function.

Examples

>>> import pandas as pd
>>> import matplotlib.pyplot as plt
>>> # Create sample data
>>> data = pd.DataFrame({'val': [1, 2, 2, 3, 3, 3, 4, 4, 5]})
>>> # Run EDA
>>> stats, ax = eda(data, 'val')
>>> # The plot object (ax) can be used to tweak the visual
>>> ax.set_title("Target Distribution")
>>> # The plot will then display
>>> # plt.show()