Generates a consolidated exploratory summary of the dataset.
This function provides a single, high level overview of the dataset by combining commonly used exploratory data analysis (EDA) outputs such as dataset dimensions, column data types, missing value counts, and descriptive statistics. It is intended to simplify the initial EDA process by replacing multiple pandas method calls (e.g., .info(), .describe(), .shape) with one function.
Parameters
Name
Type
Description
Default
df
pandas.DataFrame
A pandas dataFrame containing the dataset to be summarized.
required
Returns
Name
Type
Description
dict
A dictionary with the following fixed structure: - “shape” : tuple[int, int] Number of rows and columns in the DataFrame. - “columns” : list[str] List of column names, in the order they appear in the DataFrame. - “dtypes” : dict[str, str] Mapping of column names to their pandas data types (as strings). - “missing_values” : dict[str, int] Count of missing (NaN) values per column. - “summary_statistics” : dict[str, pandas.Series] Descriptive statistics for numeric columns only, as returned by pandas.DataFrame.describe().
Raises
Name
Type
Description
TypeError
If the input provided is not a pandas dataFrame.
Notes
This function does not modify the input DataFrame.
If the DataFrame is empty, all returned values will be empty but valid.
If the DataFrame contains no numeric columns, “summary_statistics” will be an empty dictionary.
The returned dictionary follows a fixed structure to support deterministic unit testing.