simplify.dataset_overview

simplify.dataset_overview(df)

Generates a consolidated exploratory summary of the dataset.

This function provides a single, high level overview of the dataset by combining commonly used exploratory data analysis (EDA) outputs such as dataset dimensions, column data types, missing value counts, and descriptive statistics. It is intended to simplify the initial EDA process by replacing multiple pandas method calls (e.g., .info(), .describe(), .shape) with one function.

Parameters

Name Type Description Default
df pandas.DataFrame A pandas dataFrame containing the dataset to be summarized. required

Returns

Name Type Description
dict A dictionary with the following fixed structure: - “shape” : tuple[int, int] Number of rows and columns in the DataFrame. - “columns” : list[str] List of column names, in the order they appear in the DataFrame. - “dtypes” : dict[str, str] Mapping of column names to their pandas data types (as strings). - “missing_values” : dict[str, int] Count of missing (NaN) values per column. - “summary_statistics” : dict[str, pandas.Series] Descriptive statistics for numeric columns only, as returned by pandas.DataFrame.describe().

Raises

Name Type Description
TypeError If the input provided is not a pandas dataFrame.

Notes

  • This function does not modify the input DataFrame.
  • If the DataFrame is empty, all returned values will be empty but valid.
  • If the DataFrame contains no numeric columns, “summary_statistics” will be an empty dictionary.
  • The returned dictionary follows a fixed structure to support deterministic unit testing.

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({
...     "artist": ["A", "B", "C"],
...     "popularity": [80, 75, None],
...     "danceability": [0.8, 0.6, 0.9]
... })
>>> dataset_overview(df)
{
    "shape": (3, 3),
    "columns": ["artist", "popularity", "danceability"],
    "dtypes": {
        "artist": "object",
        "popularity": "float",
        "danceability": "float"
    },
    "missing_values": {
        "artist": 0,
        "popularity": 1,
        "danceability": 0
    },
    "summary_statistics": {
        "popularity": {...},
        "danceability": {...}
    }
}