Memory frugal torch dataset from a csv collection
Project description
csvsdataset
csvsdataset
is a Python library designed to simplify the process of working with multiple CSV files as a single dataset. The primary functionality is provided by the CsvsDataset
class in the csvsdataset.py
module.
This was written by ChatGPT4 as mentioned here. Issues will be cut and paste into a session. It is an experiment in semi-autonomous code maintenance.
Installation
To install the csvsdataset
library, simply run:
pip install csvsdataset
Usage
from csvsdataset.csvsdataset import CsvsDataset
# Initialize the CsvsDataset instance
dataset = CsvsDataset(folder_path="path/to/your/csv/folder",
file_pattern="*.csv",
x_columns=["column1", "column2"],
y_column="target_column")
# Iterate over the dataset
for x_data, y_data in dataset:
# Your processing code here
pass
# Access a specific item in the dataset
x_data, y_data = dataset[42]
Memory frugality
Only data from a small number of csv files are maintained in memory. The rest is discarded on a LRU basis. This class is intended for use when a very large number of data files exist which cannot be loaded into memory conveniently.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for csvsdataset-0.0.7-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 151e992427bc6969f52a5f93966b59d32c70fd71166f7b4e48f5b8c39704bcba |
|
MD5 | 6e6c1b815810df06ec270efa43f25cfb |
|
BLAKE2b-256 | 5e34610d9451ec9ad9100ea571d6d201e12b8a6594705507c688df053b7d0634 |