Skip to main content

High-performance data loaders for PyTorch inspired by flashtensors

Project description

DataLFlash ⚡

High-Performance Data Loading for Deep Learning / Carga de Datos de Alto Rendimiento para Deep Learning

English | Español


🇬🇧 English

DataLFlash is an optimized library for training Deep Learning models. It is designed to handle both massive datasets that don't fit in RAM and standard datasets with maximum efficiency.

📦 Installation

# Install from GitHub
pip install git+https://github.com/dylan-irzi/DataLFlash.git

# Or install locally (for development)
git clone https://github.com/dylan-irzi/DataLFlash.git
pip install -e .

⚡ Why DataLFlash?

  • Up to 60% Faster Training: Optimizes the entire data loading pipeline, reducing overhead and CPU bottlenecks.
  • 7.6x Speedup for In-Memory Data: Uses zero-copy vectorization to bypass standard PyTorch DataLoader limitations.
  • Seamless Integration: Works as a drop-in replacement for standard DataLoaders.

🔍 How it Works

DataLFlash rethinks data loading by:

  1. Chunking (Disk): Storing data in contiguous memory-mapped chunks to minimize disk seeks.
  2. Vectorization (Memory): Using direct tensor slicing instead of item-by-item iteration, removing Python loop overhead.
  3. Background Prefetching: Aggressively loading future batches while the GPU is busy.

Benchmark Graph

Usage Options

Option A: Massive Datasets (Chunking) 📦

Best for: Datasets larger than RAM (e.g., 100GB+), optimizing I/O.

Step 1: Convert Dataset Convert your standard PyTorch dataset into optimized chunks on disk.

from datalflash.utils import DatasetConverter

# Your original PyTorch dataset
my_dataset = ... 

DatasetConverter.create_chunked_dataset(
    pytorch_dataset=my_dataset,
    output_dir="./data_chunks",
    chunk_size=10000,
    split_ratios={'train': 0.8, 'val': 0.1, 'test': 0.1},
    shuffle=True
)

Step 2: Load Dataloaders Automatically load train/val/test loaders. DataLFlash manages efficient background chunk loading.

from datalflash.core import get_dataloaders

loaders = get_dataloaders(
    chunks_dir="./data_chunks",
    batch_size=64,
    augmentations={'train': my_train_transform}
)

train_loader = loaders['train']

Step 3: Train Use the loader just like a standard PyTorch DataLoader.

for features, targets in train_loader:
    # Training loop...
    pass

Option B: Standard Datasets (In-Memory Vectorization) ⚡

Best for: Datasets that fit in RAM (e.g., CIFAR10, MNIST). Replaces PyTorch DataLoader for extreme speed.

New Feature: DataLFlash now detects in-memory FlashDataset and uses Vectorized Slicing instead of item-by-item iteration. This bypasses the slow Python loop and collate_fn.

Benchmark: ~7.6x faster than standard PyTorch DataLoader on CIFAR-10 (Single Thread).

from datalflash.core import FlashDataLoader, FlashDataset
from datalflash.utils import DatasetConverter

# 1. Convert standard dataset to FlashDataset (In-Memory)
# This creates a highly optimized memory layout (features/targets tensors)
flash_dataset = DatasetConverter.from_pytorch_dataset(
    pytorch_dataset=my_standard_dataset,
    memory_optimized=True
)

# 2. Optimized DataLoader
# Automatically uses Vectorized Slicing (C++ speed)
train_loader = FlashDataLoader(
    flash_dataset,
    batch_size=64,
    shuffle=True,
    num_workers=0, # 0 is faster because we don't need workers to read from disk!
    pin_memory=True
)

for batch in train_loader:
    # Instant batch delivery!
    ...

🚀 Benchmarks

Dataset Method Time (Epoch) Speedup
CIFAR-10 PyTorch DataLoader (Standard) 12.68s 1x
CIFAR-10 DataLFlash (Vectorized) 1.68s 7.6x

🇪🇸 Español

DataLFlash es una librería optimizada para el entrenamiento de modelos de Deep Learning. Está diseñada para manejar tanto datasets masivos que no caben en RAM como datasets estándar con la máxima eficiencia.

📦 Instalación

# Instalar desde GitHub
pip install git+https://github.com/dylan-irzi/DataLFlash.git

# O instalar localmente (para desarrollo)
git clone https://github.com/dylan-irzi/DataLFlash.git
cd DataLFlash
pip install -e .

⚡ ¿Por qué DataLFlash?

  • Hasta 60% más rápido: Optimiza todo el pipeline de carga de datos, reduciendo el overhead y cuellos de botella en CPU.
  • 7.6x de Aceleración en Memoria: Usa vectorización zero-copy para superar las limitaciones del DataLoader estándar.
  • Integración Sencilla: Funciona como un reemplazo directo.

🔍 ¿Cómo funciona?

DataLFlash rediseña la carga de datos mediante:

  1. Chunking (Disco): Almacena datos en bloques contiguos mapeados en memoria para minimizar búsquedas en disco.
  2. Vectorización (Memoria): Usa slicing directo de tensores en lugar de iterar elemento por elemento.
  3. Prefetching en Segundo Plano: Carga agresivamente los siguientes batches mientras la GPU está ocupada.

Benchmark Graph

Opciones de Uso

Opción A: Datasets Masivos (Chunking) 📦

Ideal para: Datasets más grandes que la RAM (ej. 100GB+), optimizando I/O.

Paso 1: Convertir Dataset Convierte tu dataset estándar de PyTorch en "chunks" optimizados en disco.

from datalflash.utils import DatasetConverter

# Tu dataset original
my_dataset = ... 

DatasetConverter.create_chunked_dataset(
    pytorch_dataset=my_dataset,
    output_dir="./data_chunks",
    chunk_size=10000,
    split_ratios={'train': 0.8, 'val': 0.1, 'test': 0.1},
    shuffle=True
)

Paso 2: Cargar Dataloaders Carga automáticamente los loaders de train/val/test. DataLFlash gestiona la carga eficiente en segundo plano.

from datalflash.core import get_dataloaders

loaders = get_dataloaders(
    chunks_dir="./data_chunks",
    batch_size=64,
    augmentations={'train': my_train_transform}
)

train_loader = loaders['train']

Paso 3: Entrenar Usa el loader exactamente igual que un DataLoader de PyTorch.

for features, targets in train_loader:
    # Ciclo de entrenamiento...
    pass

Opción B: Datasets Estándar (Vectorización en Memoria) ⚡

Ideal para: Datasets que caben en RAM (ej. CIFAR10, MNIST). Reemplaza al DataLoader de PyTorch para velocidad extrema.

Nueva Característica: DataLFlash ahora detecta si usas un FlashDataset en memoria y utiliza Slicing Vectorizado en lugar de iterar elemento por elemento. Esto evita el bucle lento de Python y la función collate_fn.

Benchmark: ~7.6x más rápido que el DataLoader estándar de PyTorch en CIFAR-10 (Single Thread).

from datalflash.core import FlashDataLoader, FlashDataset
from datalflash.utils import DatasetConverter

# 1. Convertir dataset estándar a FlashDataset (En Memoria)
# Esto crea un layout de memoria altamente optimizado (tensores features/targets)
flash_dataset = DatasetConverter.from_pytorch_dataset(
    pytorch_dataset=mi_dataset_estandar,
    memory_optimized=True
)

# 2. DataLoader Optimizado
# Usa automáticamente Slicing Vectorizado (Velocidad C++)
train_loader = FlashDataLoader(
    flash_dataset,
    batch_size=64,
    shuffle=True,
    num_workers=0, # 0 es más rápido porque no necesitamos leer de disco
    pin_memory=True
)

for batch in train_loader:
    # ¡Entrega de batch instantánea!
    ...

🚀 Benchmarks

Dataset Método Tiempo (Epoch) Speedup
CIFAR-10 PyTorch DataLoader (Estándar) 12.68s 1x
CIFAR-10 DataLFlash (Vectorizado) 1.68s 7.6x

📚 API Reference / Referencia API

DatasetConverter.create_chunked_dataset

Param (Eng) Param (Esp) Default Description / Descripción
pytorch_dataset pytorch_dataset Req Source dataset / Dataset origen
output_dir output_dir Req Output directory / Directorio de salida
chunk_size chunk_size 10000 Samples per chunk / Muestras por chunk
split_ratios split_ratios {'train': 0.8...} Train/Val/Test splits / Divisiones
shuffle shuffle True Shuffle before chunking / Barajar antes de crear chunks

FlashDataLoader

Param (Eng) Param (Esp) Default Description / Descripción
dataset dataset Req Dataset to load / Dataset a cargar
batch_size batch_size 1 Batch size / Tamaño del batch
shuffle shuffle False Shuffle data / Barajar datos
sampler sampler None Custom sampler / Muestreo personalizado
collate_fn collate_fn default Merge samples / Unir muestras
drop_last drop_last False Drop incomplete batch / Descartar último batch

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datalflash-0.1.0.tar.gz (19.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datalflash-0.1.0-py3-none-any.whl (19.3 kB view details)

Uploaded Python 3

File details

Details for the file datalflash-0.1.0.tar.gz.

File metadata

  • Download URL: datalflash-0.1.0.tar.gz
  • Upload date:
  • Size: 19.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for datalflash-0.1.0.tar.gz
Algorithm Hash digest
SHA256 061d7c9a61b40185408792d2ac395f90dcc8d80b60b7b459a17d47b1a4fb61c7
MD5 95076331d8eb58cde0d4712cc4794ded
BLAKE2b-256 a8a3562e9499fb87ddaf0cfa12707f9ed67ab32e1f90cfb0ea29eff2f754cbbb

See more details on using hashes here.

File details

Details for the file datalflash-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: datalflash-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for datalflash-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9b1d9ba89b945d0f1c62abeae43e6c4636d7f36f3d1ab48f92d8df6f797c6de9
MD5 f33d9b7db799eaf45d50ee41b6fae026
BLAKE2b-256 77f54f37abe303d3acb9280ce4b8cde9e3ca54d0abb61d853296b8cdae54755c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page