load_optimized_csv
load_optimized_csv
Load a CSV file and return a memory-optimized DataFrame.
Functions
| Name | Description |
|---|---|
| load_optimized_csv | Load a CSV as a memory-optimized DataFrame with type downcasting |
load_optimized_csv
load_optimized_csv.load_optimized_csv(
filepath,
nrows=None,
usecols=None,
no_sparse_cols=None,
no_downcast_cols=None,
no_category_cols=None,
sparse_threshold=0.3,
category_threshold=0.7,
**kwargs,
)Load a CSV as a memory-optimized DataFrame with type downcasting and categorical/sparse conversions.
Automatically determines optimal chunk size based on file size and available system memory, then processes each chunk by downcasting dtypes, converting low-cardinality string columns to categorical, and converting high-zero-density columns to sparse. Returns a single concatenated, memory-optimized DataFrame with a RangeIndex.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| filepath | str | Path to the CSV file to load. | required |
| nrows | int | Maximum number of rows to read. If None, reads all rows. | None |
| usecols | list of str | Columns to read. If None, reads all columns. | None |
| no_sparse_cols | list of str | Columns to exclude from sparse conversion. | None |
| no_downcast_cols | list of str | Columns to exclude from dtype downcasting. | None |
| no_category_cols | list of str | Columns to exclude from categorical conversion. | None |
| sparse_threshold | float | Minimum proportion of zeros required to convert a column to sparse. Must be between 0 and 1. | 0.3 |
| category_threshold | float | Maximum ratio of unique values to total values for a string column to be converted to categorical. Must be between 0 and 1. | 0.7 |
| **kwargs | Additional keyword arguments passed to pandas.read_csv (e.g., sep, encoding, parse_dates). |
{} |
Returns
| Name | Type | Description |
|---|---|---|
| pd.DataFrame | A memory-optimized DataFrame with: - Numeric columns downcasted to smallest sufficient dtype - Low-cardinality string columns converted to categorical - High-zero columns converted to SparseDtype - RangeIndex set as index |
Raises
| Name | Type | Description |
|---|---|---|
| FileNotFoundError | If filepath does not exist. |
|
| ValueError | If the file is not a valid CSV, if sparse_threshold or category_threshold are not in [0, 1], or if usecols contains columns not present in the CSV. |
|
| TypeError | If arguments are of incorrect types. | |
| pd.errors.EmptyDataError | If the CSV file is empty or contains only headers. | |
| MemoryError | If the final DataFrame exceeds available memory. |
Examples
>>> from csvplus.load_optimized_csv import load_optimized_csv
>>> df = load_optimized_csv(
... "large_dataset.csv",
... nrows=100000,
... usecols=["id", "value", "category", "status"],
... no_sparse_cols=["id"],
... no_downcast_cols=["value"],
... no_category_cols=["id"],
... sparse_threshold=0.6,
... category_threshold=0.3,
... )
>>> df.info(memory_usage="deep")