Memory frugal torch dataset from a csv collection
Project description
csvsdataset
csvsdataset
is a Python library designed to simplify the process of working with multiple CSV files as a single dataset. The primary functionality is provided by the CsvsDataset
class in the csvsdataset.py
module.
This was written by ChatGPT4 as mentioned here. Issues will be cut and paste into a session. It is an experiment in semi-autonomous code maintenance.
Installation
To install the csvsdataset
library, simply run:
pip install csvsdataset
Usage
from csvsdataset.csvsdataset import CsvsDataset
# Initialize the CsvsDataset instance
dataset = CsvsDataset(folder_path="path/to/your/csv/folder",
file_pattern="*.csv",
x_columns=["column1", "column2"],
y_column="target_column")
# Iterate over the dataset
for x_data, y_data in dataset:
# Your processing code here
pass
# Access a specific item in the dataset
x_data, y_data = dataset[42]
Memory frugality
Only data from a small number of csv files are maintained in memory. The rest is discarded on a LRU basis. This class is intended for use when a very large number of data files exist which cannot be loaded into memory conveniently.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file csvsdataset-0.0.7.tar.gz
.
File metadata
- Download URL: csvsdataset-0.0.7.tar.gz
- Upload date:
- Size: 35.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | edbd1b5640a4a904014ed9476ea7ee3f551994a9d125f48d5d64500070c9161d |
|
MD5 | 9f1f2706473b41c8e2d7c6fbd92b8afc |
|
BLAKE2b-256 | 1cef7259452de864117bed0e0ec17ffc07117b901fe54ad5f5e51f0d8adf85b5 |
File details
Details for the file csvsdataset-0.0.7-py3-none-any.whl
.
File metadata
- Download URL: csvsdataset-0.0.7-py3-none-any.whl
- Upload date:
- Size: 35.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 151e992427bc6969f52a5f93966b59d32c70fd71166f7b4e48f5b8c39704bcba |
|
MD5 | 6e6c1b815810df06ec270efa43f25cfb |
|
BLAKE2b-256 | 5e34610d9451ec9ad9100ea571d6d201e12b8a6594705507c688df053b7d0634 |