Load a CSV from a path/URL or validate and clean a provided DataFrame.
Parameters
Name
Type
Description
Default
dataframe
Optional[pd.DataFrame]
An already-loaded DataFrame to validate and clean.
None
source
str
Path or URL to a CSV file. HTTP/HTTPS URLs and local filesystem paths are supported.
None
expected_min_cols
int
Minimum number of columns expected after loading (default: 2). Used to detect probable delimiter or corruption issues.
1
sample_size
int
Number of characters to sample from the source when sniffing the delimiter and detecting basic corruption (default: 2048).
2048
Returns
Name
Type
Description
tuple[pandas.DataFrame, ChangeReport]
df : pandas.DataFrame Cleaned and validated DataFrame. Cleaning includes normalizing column headers (strip, whitespace -> underscore, replace illegal chars with underscores) and trimming string cells. report : ChangeReport Report of changes and metadata (detected delimiter, renamed columns mapping, counts of trimmed cells and illegal-char fixes, shape before/after).
Raises
Name
Type
Description
TypeError
If source is neither a string nor a pandas.DataFrame.
DataLoadError
On I/O or parsing failures and validation errors, including: - unable to read/download source - inconsistent column counts in sample (possible corruption) - first row looks like data instead of header - pandas failed to parse CSV - resulting DataFrame is empty or has fewer than expected_min_cols
Notes
Delimiter detection uses csv.Sniffer on a sample; falls back to ‘,’ on failure.
When source is a DataFrame, it is copied and validated; no I/O is performed.