No project description provided
Project description
JAX DataLoader
A lightweight DataLoader for JAX to load data from various file formats, including CSV, JSON, and more. The goal of this project is to port TensorFlow Dataset (TFDS) functionality into JAX while supporting multiple data sources and preprocessing.
Features:
- Load data from multiple sources (CSV, JSON, and more).
- Parallel data loading using Python's
multiprocessing. - JAX integration for optimized data preprocessing using
vmap. - Easy-to-use interface for batch loading.
- JAX-based preprocessing using
jitandvmap.
Installation
You can install the required dependencies with the following command:
pip install jax jaxlib pandas numpy
Optional (For multiprocessed data loading):
pip install multiprocessing
Usage
1. Basic Data Loading from CSV
This example shows how to load data from a CSV file, specify the target column (label), and use batching with JAXDataLoader.
import numpy as np
from jax_dataloader import JAXDataLoader, load_custom_data
# Example 1: Loading CSV data
dataset_path = 'path_to_your_dataset.csv'
batch_size = 32
dataloader = load_custom_data(dataset_path, file_type='csv', batch_size=batch_size, target_column='median_house_value')
for batch_x, batch_y in dataloader:
print(batch_x.shape, batch_y.shape)
2. Data Loading from JSON
This example shows how to load data from a JSON file.
# Example 2: Loading JSON data
dataset_path = 'path_to_your_dataset.json'
batch_size = 32
dataloader = load_custom_data(dataset_path, file_type='json', batch_size=batch_size, target_column='median_house_value')
for batch_x, batch_y in dataloader:
print(batch_x.shape, batch_y.shape)
3. Load Data from Custom Sources
You can easily extend the load_custom_data function to support additional file formats by adding a custom data loading function and handling it in the file_type argument.
# Example 3: Load from a custom source
dataset_path = 'path_to_your_custom_data_file'
file_type = 'your_file_type' # Can be 'csv', 'json', etc.
batch_size = 64
dataloader = load_custom_data(dataset_path, file_type=file_type, batch_size=batch_size, target_column='your_target_column')
Contributing
Feel free to contribute by submitting issues and pull requests. If you want to add new features or improve the performance, your contributions are welcome!
License
MIT License. See LICENSE for more details.
Project Structure:
jax-dataloader/
│
├── jax_dataloader.py # Contains the JAXDataLoader class and data loading logic
├── dataset/ # Example dataset folder
│ ├── housing.csv # Example CSV data
│ └── housing.json # Example JSON data
├── README.md # This README file
└── requirements.txt # Python dependencies
Pushing to GitHub:
-
Initialize a Git repository:
git init -
Add your files:
git add .
-
Commit your changes:
git commit -m "Initial commit: JAX DataLoader"
-
Push to GitHub:
git remote add origin https://github.com/your-username/jax-dataloader.git git push -u origin master
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jax_dataloaders-0.1.0.tar.gz.
File metadata
- Download URL: jax_dataloaders-0.1.0.tar.gz
- Upload date:
- Size: 45.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d86a43f7e3c6e4782b9f7c228bac09e07cf4611ec1e8dde0d67ffd8697a63383
|
|
| MD5 |
328516187a9908f9a2708f7485652121
|
|
| BLAKE2b-256 |
7c9cf1a9a2787a3e5abe392c48bd34e109f3bfb404b49baff8ca4a2edb1108ff
|
File details
Details for the file jax_dataloaders-0.1.0-py3-none-any.whl.
File metadata
- Download URL: jax_dataloaders-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
26a18a62e93245042d1b89683c54cb74897838f05f8700beacc04f6aeb8ed1ab
|
|
| MD5 |
678d4d87118cf247f7adf750a9bb9620
|
|
| BLAKE2b-256 |
f8ac8f0d100072e832f655ac1ef291a2e328cfa88702e7203ff3b11f4049ad29
|