High-performance data loaders for PyTorch inspired by flashtensors
Project description
DataLFlash ⚡
High-Performance Data Loading for Deep Learning / Carga de Datos de Alto Rendimiento para Deep Learning
🇬🇧 English
DataLFlash is an optimized library for training Deep Learning models. It is designed to handle both massive datasets that don't fit in RAM and standard datasets with maximum efficiency.
📦 Installation
# Install from PyPI
pip install datalflash
# View package information
pip show datalflash
# Install from GitHub
pip install git+https://github.com/dylan-irzi/DataLFlash.git
# Or install locally (for development)
git clone https://github.com/dylan-irzi/DataLFlash.git
pip install -e .
⚡ Why DataLFlash?
- Up to 60% Faster Training: Optimizes the entire data loading pipeline, reducing overhead and CPU bottlenecks.
- 7.6x Speedup for In-Memory Data: Uses zero-copy vectorization to bypass standard PyTorch DataLoader limitations.
- Seamless Integration: Works as a drop-in replacement for standard DataLoaders.
🔍 How it Works
DataLFlash rethinks data loading by:
- Chunking (Disk): Storing data in contiguous memory-mapped chunks to minimize disk seeks.
- Vectorization (Memory): Using direct tensor slicing instead of item-by-item iteration, removing Python loop overhead.
- Background Prefetching: Aggressively loading future batches while the GPU is busy.
Usage Options
Option A: Massive Datasets (Chunking) 📦
Best for: Datasets larger than RAM (e.g., 100GB+), optimizing I/O.
Step 1: Convert Dataset Convert your standard PyTorch dataset into optimized chunks on disk.
from datalflash.utils import DatasetConverter
# Your original PyTorch dataset
my_dataset = ...
DatasetConverter.create_chunked_dataset(
pytorch_dataset=my_dataset,
output_dir="./data_chunks",
chunk_size=10000,
split_ratios={'train': 0.8, 'val': 0.1, 'test': 0.1},
shuffle=True
)
Step 2: Load Dataloaders Automatically load train/val/test loaders. DataLFlash manages efficient background chunk loading.
from datalflash.core import get_dataloaders
loaders = get_dataloaders(
chunks_dir="./data_chunks",
batch_size=64,
augmentations={'train': my_train_transform}
)
train_loader = loaders['train']
Step 3: Train Use the loader just like a standard PyTorch DataLoader.
for features, targets in train_loader:
# Training loop...
pass
Option B: Standard Datasets (In-Memory Vectorization) ⚡
Best for: Datasets that fit in RAM (e.g., CIFAR10, MNIST). Replaces PyTorch DataLoader for extreme speed.
New Feature: DataLFlash now detects in-memory FlashDataset and uses Vectorized Slicing instead of item-by-item iteration. This bypasses the slow Python loop and collate_fn.
Benchmark: ~7.6x faster than standard PyTorch DataLoader on CIFAR-10 (Single Thread).
from datalflash.core import FlashDataLoader, FlashDataset
from datalflash.utils import DatasetConverter
# 1. Convert standard dataset to FlashDataset (In-Memory)
# This creates a highly optimized memory layout (features/targets tensors)
flash_dataset = DatasetConverter.from_pytorch_dataset(
pytorch_dataset=my_standard_dataset,
memory_optimized=True
)
# 2. Optimized DataLoader
# Automatically uses Vectorized Slicing (C++ speed)
train_loader = FlashDataLoader(
flash_dataset,
batch_size=64,
shuffle=True,
num_workers=0, # 0 is faster because we don't need workers to read from disk!
pin_memory=True
)
for batch in train_loader:
# Instant batch delivery!
...
🚀 Benchmarks
| Dataset | Method | Time (Epoch) | Speedup |
|---|---|---|---|
| CIFAR-10 | PyTorch DataLoader (Standard) | 12.68s | 1x |
| CIFAR-10 | DataLFlash (Vectorized) | 1.68s | 7.6x |
🇪🇸 Español
DataLFlash es una librería optimizada para el entrenamiento de modelos de Deep Learning. Está diseñada para manejar tanto datasets masivos que no caben en RAM como datasets estándar con la máxima eficiencia.
📦 Instalación
# Instalar desde PyPI
pip install datalflash
# Ver información del paquete
pip show datalflash
# Instalar desde GitHub
pip install git+https://github.com/dylan-irzi/DataLFlash.git
# O instalar localmente (para desarrollo)
git clone https://github.com/dylan-irzi/DataLFlash.git
cd DataLFlash
pip install -e .
⚡ ¿Por qué DataLFlash?
- Hasta 60% más rápido: Optimiza todo el pipeline de carga de datos, reduciendo el overhead y cuellos de botella en CPU.
- 7.6x de Aceleración en Memoria: Usa vectorización zero-copy para superar las limitaciones del DataLoader estándar.
- Integración Sencilla: Funciona como un reemplazo directo.
🔍 ¿Cómo funciona?
DataLFlash rediseña la carga de datos mediante:
- Chunking (Disco): Almacena datos en bloques contiguos mapeados en memoria para minimizar búsquedas en disco.
- Vectorización (Memoria): Usa slicing directo de tensores en lugar de iterar elemento por elemento.
- Prefetching en Segundo Plano: Carga agresivamente los siguientes batches mientras la GPU está ocupada.
Opciones de Uso
Opción A: Datasets Masivos (Chunking) 📦
Ideal para: Datasets más grandes que la RAM (ej. 100GB+), optimizando I/O.
Paso 1: Convertir Dataset Convierte tu dataset estándar de PyTorch en "chunks" optimizados en disco.
from datalflash.utils import DatasetConverter
# Tu dataset original
my_dataset = ...
DatasetConverter.create_chunked_dataset(
pytorch_dataset=my_dataset,
output_dir="./data_chunks",
chunk_size=10000,
split_ratios={'train': 0.8, 'val': 0.1, 'test': 0.1},
shuffle=True
)
Paso 2: Cargar Dataloaders Carga automáticamente los loaders de train/val/test. DataLFlash gestiona la carga eficiente en segundo plano.
from datalflash.core import get_dataloaders
loaders = get_dataloaders(
chunks_dir="./data_chunks",
batch_size=64,
augmentations={'train': my_train_transform}
)
train_loader = loaders['train']
Paso 3: Entrenar Usa el loader exactamente igual que un DataLoader de PyTorch.
for features, targets in train_loader:
# Ciclo de entrenamiento...
pass
Opción B: Datasets Estándar (Vectorización en Memoria) ⚡
Ideal para: Datasets que caben en RAM (ej. CIFAR10, MNIST). Reemplaza al DataLoader de PyTorch para velocidad extrema.
Nueva Característica: DataLFlash ahora detecta si usas un FlashDataset en memoria y utiliza Slicing Vectorizado en lugar de iterar elemento por elemento. Esto evita el bucle lento de Python y la función collate_fn.
Benchmark: ~7.6x más rápido que el DataLoader estándar de PyTorch en CIFAR-10 (Single Thread).
from datalflash.core import FlashDataLoader, FlashDataset
from datalflash.utils import DatasetConverter
# 1. Convertir dataset estándar a FlashDataset (En Memoria)
# Esto crea un layout de memoria altamente optimizado (tensores features/targets)
flash_dataset = DatasetConverter.from_pytorch_dataset(
pytorch_dataset=mi_dataset_estandar,
memory_optimized=True
)
# 2. DataLoader Optimizado
# Usa automáticamente Slicing Vectorizado (Velocidad C++)
train_loader = FlashDataLoader(
flash_dataset,
batch_size=64,
shuffle=True,
num_workers=0, # 0 es más rápido porque no necesitamos leer de disco
pin_memory=True
)
for batch in train_loader:
# ¡Entrega de batch instantánea!
...
🚀 Benchmarks
| Dataset | Método | Tiempo (Epoch) | Speedup |
|---|---|---|---|
| CIFAR-10 | PyTorch DataLoader (Estándar) | 12.68s | 1x |
| CIFAR-10 | DataLFlash (Vectorizado) | 1.68s | 7.6x |
📚 API Reference / Referencia API
DatasetConverter.create_chunked_dataset
| Param (Eng) | Param (Esp) | Default | Description / Descripción |
|---|---|---|---|
pytorch_dataset |
pytorch_dataset |
Req | Source dataset / Dataset origen |
output_dir |
output_dir |
Req | Output directory / Directorio de salida |
chunk_size |
chunk_size |
10000 |
Samples per chunk / Muestras por chunk |
split_ratios |
split_ratios |
{'train': 0.8...} |
Train/Val/Test splits / Divisiones |
shuffle |
shuffle |
True |
Shuffle before chunking / Barajar antes de crear chunks |
FlashDataLoader
| Param (Eng) | Param (Esp) | Default | Description / Descripción |
|---|---|---|---|
dataset |
dataset |
Req | Dataset to load / Dataset a cargar |
batch_size |
batch_size |
1 |
Batch size / Tamaño del batch |
shuffle |
shuffle |
False |
Shuffle data / Barajar datos |
sampler |
sampler |
None |
Custom sampler / Muestreo personalizado |
collate_fn |
collate_fn |
default |
Merge samples / Unir muestras |
drop_last |
drop_last |
False |
Drop incomplete batch / Descartar último batch |
📄 License / Licencia
This project is licensed under the MIT License - see the LICENSE file for details.
Este proyecto está bajo la Licencia MIT - mira el archivo LICENSE para más detalles.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datalflash-0.1.1.tar.gz.
File metadata
- Download URL: datalflash-0.1.1.tar.gz
- Upload date:
- Size: 21.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1250edc22537a474ff78fa550cbd4aa0e35478e612e975febf3acccf083d364e
|
|
| MD5 |
4efd419937ff55e990d31918c7418d85
|
|
| BLAKE2b-256 |
7beed432667b9b6a4ccdce6b71ae7001520040865a71b40781ef2644cafc4892
|
File details
Details for the file datalflash-0.1.1-py3-none-any.whl.
File metadata
- Download URL: datalflash-0.1.1-py3-none-any.whl
- Upload date:
- Size: 20.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b965927f4749b1ea90aef6d819ec03cd5bda70a0f1f013d7b0c319e7e1bd293d
|
|
| MD5 |
a08a2fc0b2d1f0298dc3c169781f4128
|
|
| BLAKE2b-256 |
718b91dcc1f5690ab30235940e441831272976103067fdf6465ee2023c2f0e05
|