Neural Ready Archive — stream, deduplicate, and train on ML datasets without downloading. Built in Rust.
Project description
🧬 NRA (Neural Ready Archive)
The 21st Century Data Format for the AI Era. Forget about tar.gz and zip.
Traditional archiving formats (ZIP, Tar.gz) were designed in the 90s for floppy disks. Today, they are the main bottleneck of IT infrastructure. They force you to download entire 500GB datasets, cannot stream individual files from the cloud, and cause extremely expensive GPUs to sit idle waiting for data.
NRA (Neural Ready Archive) is a next-generation binary format. It combines enterprise-grade deduplication, ultra-fast Zstd compression, and B+ Tree indexing so you can train neural networks directly from the public cloud.
⚡ Why Legacy Formats Are Dead (Our Benchmarks)
We ran a stress test on 60,000 small files (CIFAR-10):
| Format | Packing Time | "Cold Start" Speed (Streaming) |
|---|---|---|
| 🛑 Tar.gz | 38.0 seconds | ~30 minutes (Requires full download) |
| 🛑 ZIP | 13.4 seconds | Impossible (Requires full download) |
| 🏆 NRA | 3.3 seconds (11.5x faster) | 150 milliseconds (Zero-Download) |
🏆 Competitive Radar: NRA vs Everyone
NRA v4.5 is the only format that scores maximum across all technical parameters:
🚀 Try It Now: Train Online Without Downloading
Use our ready-made dataset on Hugging Face
pip install nra torch
import nra
import torch
from torch.utils.data import Dataset, DataLoader
class NraStreamDataset(Dataset):
def __init__(self, url):
self.url = url
# The manifest downloads in 150ms. The archive itself stays in the cloud!
self.file_ids = nra.CloudArchive(url).file_ids()
self._archive = None
def __len__(self):
return len(self.file_ids)
def __getitem__(self, idx):
if self._archive is None:
self._archive = nra.CloudArchive(self.url)
raw_bytes = self._archive.read_file(self.file_ids[idx])
return torch.tensor([len(raw_bytes)], dtype=torch.float32)
# 🤗 Our ready-made dataset on Hugging Face (NRA format)
dataset = NraStreamDataset(
"https://huggingface.co/datasets/zevatov/nra-cifar10/resolve/main/cifar10.nra"
)
loader = DataLoader(dataset, batch_size=256, num_workers=4)
for batch in loader:
# Training starts at second 0. Zero bytes on your SSD!
pass
🏗️ How Cloud Architecture Works (Zero-Disk I/O)
PyTorch DataLoader
│
▼
NRA Core (Rust) ──── B+ Tree manifest lookup O(1)
│
▼
HTTP GET Range: bytes=X-Y → HuggingFace / S3
│
▼
Zstd decompress in RAM → raw bytes → GPU tensor
Zero Disk I/O. Your SSD is never touched. Data flows: Cloud → RAM → GPU.
🛠️ The NRA Ecosystem
- Python SDK (
pip install nra): Integration into PyTorch and TensorFlow. - NRA CLI (
cargo install nra-cli): Console utility for servers. - NRA GUI: Desktop application for visual archive management. (In development: zevatov/nra-manager-pro)
- FUSE Mount: Mount
.nraarchives as virtual drives (nra-cli mount). - 🤗 HuggingFace: zevatov/nra-cifar10 — ready-to-use NRA dataset.
🗺️ Roadmap
| Milestone | Status |
|---|---|
| 1.0 Core Engine (NRA Format Spec v4.5) | ✅ Released |
| 1.0 Python SDK + PyTorch DataLoader | ✅ Released |
| 1.0 CLI (pack, extract, convert, stream, mount) | ✅ Released |
| 1.1 NRA Manager Pro (GUI) | 🔧 In Progress |
| 1.2 Delta Updates (append without rebuild) | 📋 Planned |
| 1.3 NRA CDN (edge-caching proxy) | 📋 Planned |
| 1.4 NRA Registry (private dataset hub) | 📋 Planned |
| 1.5 Streaming Converter (remote tar.gz → NRA live) | 📋 Planned |
| 2.0 Multi-platform PyPI Wheels | 📋 Planned |
📚 Deep Documentation
For full architectural documentation, whitepapers, and archiving benchmarks, visit our Official GitHub Repository.
License
The nra-core, nra-cli, and nra-python components are distributed under the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nra-1.0.2.tar.gz.
File metadata
- Download URL: nra-1.0.2.tar.gz
- Upload date:
- Size: 89.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0aee1708460bbd0d35b1745ce8822b901a4e10f5ee1f6842b6b2390df1cf9e94
|
|
| MD5 |
c30fe7cdba833e6a0c5e01a987efb018
|
|
| BLAKE2b-256 |
408afc86037f2e4feca01021741b0f06dc57dddbadf859c07e8e72250ea1ee1e
|
File details
Details for the file nra-1.0.2-cp313-cp313-macosx_11_0_arm64.whl.
File metadata
- Download URL: nra-1.0.2-cp313-cp313-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.13, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3188601945b3af8151cf851bff003045c0880b66d794d03a0592353f4ac41629
|
|
| MD5 |
5e4b61ccf0117e16b925383c7aa04560
|
|
| BLAKE2b-256 |
30da8e117fa5196bf8879dcf5e9a89d812c5ddb17f5e0fd3ec7043641becd58e
|