Memory frugal torch dataset from a csv collection
Project description
csvsdataset
csvsdataset
is a Python library designed to simplify the process of working with multiple CSV files as a single dataset. The primary functionality is provided by the CsvsDataset
class in the csvsdataset.py
module.
Installation
To install the csvsdataset
library, simply run:
pip install csvsdataset
Usage
from csvsdataset.csvsdataset import CsvsDataset
# Initialize the CsvsDataset instance
dataset = CsvsDataset(folder_path="path/to/your/csv/folder",
file_pattern="*.csv",
x_columns=["column1", "column2"],
y_column="target_column")
# Iterate over the dataset
for x_data, y_data in dataset:
# Your processing code here
pass
# Access a specific item in the dataset
x_data, y_data = dataset[42]
Memory frugality
Only data from a small number of csv files are maintained in memory. The rest is discarded on a LRU basis. This class is intended for use when a very large number of data files exist which cannot be loaded into memory conveniently.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
csvsdataset-0.0.6.tar.gz
(35.0 MB
view hashes)
Built Distribution
Close
Hashes for csvsdataset-0.0.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | be4eae39c38376dfe6c2c9f82982a5fcb6e3f00ae472e237ff10c07a59ba2e52 |
|
MD5 | 9df1b9ec6c04a397df1a74c79fb5045c |
|
BLAKE2b-256 | 261d38cb0be9745ab3dd6d27c644dd255efa61eebdbd3a7fba7c8ab79bd8afa8 |