eda.eda
eda.eda(X, y)Perform exploratory data analysis on a single numeric column of a DataFrame.
This function computes descriptive statistics for a specified column and generates a histogram to visualize its distribution. It is written defensively and will raise informative errors when invalid inputs or unsupported data types are provided.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| X | pandas.DataFrame | Input DataFrame containing the column to be analyzed. Typically the target column | required |
| y | str | Name of the column in X for which summary statistics and a histogram will be generated. The column must exist in X and contain numeric values. |
required |
Returns
| Name | Type | Description |
|---|---|---|
| summary_stats | pandas.Series | Descriptive statistics for column y, as returned by pandas.Series.describe. |
| histogram | matplotlib.axes.Axes | Matplotlib Axes object containing the histogram of column y. |
Raises
| Name | Type | Description |
|---|---|---|
| TypeError | If X is not a pandas DataFrame. If y is not a string. If column y is not numeric. |
|
| KeyError | If column y does not exist in X. |
|
| ValueError | If column y is empty. If column y contains only missing values (NaNs). |
Notes
This function creates a matplotlib plot but does not display it. To render the histogram, call matplotlib.pyplot.show() after invoking this function.
Examples
>>> import pandas as pd
>>> import matplotlib.pyplot as plt
>>> # Create sample data
>>> data = pd.DataFrame({'val': [1, 2, 2, 3, 3, 3, 4, 4, 5]})
>>> # Run EDA
>>> stats, ax = eda(data, 'val')
>>> # The plot object (ax) can be used to tweak the visual
>>> ax.set_title("Target Distribution")
>>> # The plot will then display
>>> # plt.show()