simplify.dataset_overview

simplify.dataset_overview(df)

Generates a consolidated exploratory summary of the dataset.

This function provides a single, high level overview of the dataset by combining commonly used exploratory data analysis (EDA) outputs such as dataset dimensions, column data types, missing value counts, and descriptive statistics. It is intended to simplify the initial EDA process by replacing multiple pandas method calls (e.g., .info(), .describe(), .shape) with one function.

Parameters

Name	Type	Description	Default
df	pandas.DataFrame	A pandas dataFrame containing the dataset to be summarized.	required

Returns

Name	Type	Description
	dict	A dictionary with the following fixed structure: - “shape” : tuple[int, int] Number of rows and columns in the DataFrame. - “columns” : list[str] List of column names, in the order they appear in the DataFrame. - “dtypes” : dict[str, str] Mapping of column names to their pandas data types (as strings). - “missing_values” : dict[str, int] Count of missing (NaN) values per column. - “summary_statistics” : dict[str, pandas.Series] Descriptive statistics for numeric columns only, as returned by `pandas.DataFrame.describe()`.

Raises

Name	Type	Description
	TypeError	If the input provided is not a pandas dataFrame.

Notes

This function does not modify the input DataFrame.
If the DataFrame is empty, all returned values will be empty but valid.
If the DataFrame contains no numeric columns, “summary_statistics” will be an empty dictionary.
The returned dictionary follows a fixed structure to support deterministic unit testing.

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({
...     "artist": ["A", "B", "C"],
...     "popularity": [80, 75, None],
...     "danceability": [0.8, 0.6, 0.9]
... })
>>> dataset_overview(df)
{
    "shape": (3, 3),
    "columns": ["artist", "popularity", "danceability"],
    "dtypes": {
        "artist": "object",
        "popularity": "float",
        "danceability": "float"
    },
    "missing_values": {
        "artist": 0,
        "popularity": 1,
        "danceability": 0
    },
    "summary_statistics": {
        "popularity": {...},
        "danceability": {...}
    }
}