Skip to main content

Neural Ready Archive — stream, deduplicate, and train on ML datasets without downloading. Built in Rust.

Project description

🧬 NRA (Neural Ready Archive)

The 21st Century Data Format for the AI Era. Forget about tar.gz and zip.

PyPI version Rust License HuggingFace GitHub


Traditional archiving formats (ZIP, Tar.gz) were designed in the 90s for floppy disks. Today, they are the main bottleneck of IT infrastructure. They force you to download entire 500GB datasets, cannot stream individual files from the cloud, and cause extremely expensive GPUs to sit idle waiting for data.

NRA (Neural Ready Archive) is a next-generation binary format. It combines enterprise-grade deduplication, ultra-fast Zstd compression, and B+ Tree indexing so you can train neural networks directly from the public cloud.


⚡ Why Legacy Formats Are Dead (Our Benchmarks)

We ran a stress test on 60,000 small files (CIFAR-10):

Format Packing Time "Cold Start" Speed (Streaming)
🛑 Tar.gz 38.0 seconds ~30 minutes (Requires full download)
🛑 ZIP 13.4 seconds Impossible (Requires full download)
🏆 NRA 3.3 seconds (11.5x faster) 150 milliseconds (Zero-Download)
Archiver Benchmark

🏆 Competitive Radar: NRA vs Everyone

NRA v4.5 is the only format that scores maximum across all technical parameters:

NRA Competitive Radar

🚀 Try It Now: Train Online Without Downloading

Use our ready-made dataset on Hugging Face

pip install nra torch
import nra
import torch
from torch.utils.data import Dataset, DataLoader

class NraStreamDataset(Dataset):
    def __init__(self, url):
        self.url = url
        # The manifest downloads in 150ms. The archive itself stays in the cloud!
        self.file_ids = nra.CloudArchive(url).file_ids()
        self._archive = None
        
    def __len__(self):
        return len(self.file_ids)
        
    def __getitem__(self, idx):
        if self._archive is None:
            self._archive = nra.CloudArchive(self.url)
        raw_bytes = self._archive.read_file(self.file_ids[idx])
        return torch.tensor([len(raw_bytes)], dtype=torch.float32)

# 🤗 Our ready-made dataset on Hugging Face (NRA format)
dataset = NraStreamDataset(
    "https://huggingface.co/datasets/zevatov/nra-cifar10/resolve/main/cifar10.nra"
)
loader = DataLoader(dataset, batch_size=256, num_workers=4)

for batch in loader:
    # Training starts at second 0. Zero bytes on your SSD!
    pass

🤗 Open the dataset on Hugging Face →


🏗️ How Cloud Architecture Works (Zero-Disk I/O)

PyTorch DataLoader
       │
       ▼
NRA Core (Rust)  ──── B+ Tree manifest lookup O(1)
       │
       ▼
HTTP GET Range: bytes=X-Y  →  HuggingFace / S3
       │
       ▼
Zstd decompress in RAM  →  raw bytes  →  GPU tensor

Zero Disk I/O. Your SSD is never touched. Data flows: Cloud → RAM → GPU.


🛠️ The NRA Ecosystem

  1. Python SDK (pip install nra): Integration into PyTorch and TensorFlow.
  2. NRA CLI (cargo install nra-cli): Console utility for servers.
  3. NRA GUI: Desktop application for visual archive management. (In development: zevatov/nra-manager-pro)
  4. FUSE Mount: Mount .nra archives as virtual drives (nra-cli mount).
  5. 🤗 HuggingFace: zevatov/nra-cifar10 — ready-to-use NRA dataset.

🗺️ Roadmap

Milestone Status
1.0 Core Engine (NRA Format Spec v4.5) ✅ Released
1.0 Python SDK + PyTorch DataLoader ✅ Released
1.0 CLI (pack, extract, convert, stream, mount) ✅ Released
1.1 NRA Manager Pro (GUI) 🔧 In Progress
1.2 Delta Updates (append without rebuild) 📋 Planned
1.3 NRA CDN (edge-caching proxy) 📋 Planned
1.4 NRA Registry (private dataset hub) 📋 Planned
1.5 Streaming Converter (remote tar.gz → NRA live) 📋 Planned
2.0 Multi-platform PyPI Wheels 📋 Planned

📚 Deep Documentation

For full architectural documentation, whitepapers, and archiving benchmarks, visit our Official GitHub Repository.

License

The nra-core, nra-cli, and nra-python components are distributed under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nra-1.0.2.tar.gz (89.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nra-1.0.2-cp313-cp313-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

File details

Details for the file nra-1.0.2.tar.gz.

File metadata

  • Download URL: nra-1.0.2.tar.gz
  • Upload date:
  • Size: 89.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.13.1

File hashes

Hashes for nra-1.0.2.tar.gz
Algorithm Hash digest
SHA256 0aee1708460bbd0d35b1745ce8822b901a4e10f5ee1f6842b6b2390df1cf9e94
MD5 c30fe7cdba833e6a0c5e01a987efb018
BLAKE2b-256 408afc86037f2e4feca01021741b0f06dc57dddbadf859c07e8e72250ea1ee1e

See more details on using hashes here.

File details

Details for the file nra-1.0.2-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for nra-1.0.2-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3188601945b3af8151cf851bff003045c0880b66d794d03a0592353f4ac41629
MD5 5e4b61ccf0117e16b925383c7aa04560
BLAKE2b-256 30da8e117fa5196bf8879dcf5e9a89d812c5ddb17f5e0fd3ec7043641becd58e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page