Function reference

Overview

csvplus is a lightweight Python package that provides practical utilities for loading, comparing, cleaning, and summarizing tabular data. While some functions operate directly on CSV files, others work with pandas DataFrames, making csvplus easy to integrate into existing data analysis workflows.

It is designed for data scientists, analysts, and students who work with evolving datasets and want clear, interpretable insights into data structure, quality, and change over time.

Data Loading

Function for loading a CSV file and return a memory-optimized DataFrame.

load_optimized_csv Load a CSV file and return a memory-optimized DataFrame.

Data Comparison

Function for summarizing structural and statistical differences between two DataFrame versions.

data_version_diff Summarize structural and statistical differences between two DataFrames.

Data Cleaning

Function for resolving inconsistent string values to standardized names using fuzzy matching.

data_correction A module that replaces data values to the resolved name within a column.

Reporting

Function for generating detailed numeric and categorical summary reports, including missingness and confidence intervals.

generate_report A module that generates a summary report given an input dataframe.