Function reference
Overview
csvplus is a lightweight Python package that provides practical utilities for loading, comparing, cleaning, and summarizing tabular data. While some functions operate directly on CSV files, others work with pandas DataFrames, making csvplus easy to integrate into existing data analysis workflows.
It is designed for data scientists, analysts, and students who work with evolving datasets and want clear, interpretable insights into data structure, quality, and change over time.
Data Loading
Function for loading a CSV file and return a memory-optimized DataFrame.
| load_optimized_csv | Load a CSV file and return a memory-optimized DataFrame. |
Data Comparison
Function for summarizing structural and statistical differences between two DataFrame versions.
| data_version_diff | Summarize structural and statistical differences between two DataFrames. |
Data Cleaning
Function for resolving inconsistent string values to standardized names using fuzzy matching.
| data_correction | A module that replaces data values to the resolved name within a column. |
Reporting
Function for generating detailed numeric and categorical summary reports, including missingness and confidence intervals.
| generate_report | A module that generates a summary report given an input dataframe. |