Evolutionary Neural Architecture Search — compress any PyTorch model with one function call
Project description
dNATY
Evolutionary AI Model Compression
46.5% fewer FLOPs · 1.6× faster search · 98.59% accuracy retained · no GPU required
Compress any PyTorch model with one function call.
dNATY uses multi-objective evolutionary search to find smaller, faster architectures — automatically.
pip install dnaty
Quickstart
import torch.nn as nn
from dnaty import compress
from dnaty.experiments.fast_dataset import FastDataset
# 1. Your model — any nn.Module with Linear layers
model = nn.Sequential(
nn.Flatten(),
nn.Linear(784, 512), nn.ReLU(),
nn.Linear(512, 256), nn.ReLU(),
nn.Linear(256, 10)
)
# 2. Load dataset (cached in RAM — zero I/O across generations)
ds = FastDataset("MNIST", device="cpu", train_subset=10_000)
# 3. Compress
result = compress(model, ds, target_flops=0.5, n_generations=30)
print(result.summary())
# CompressResult | arch=[301, 153, 128] | FLOPs -46.5% (1,133,056 → 605,802)
# | params -46.5% (536K → 286K) | acc=0.9859
The compressed model is a regular nn.Module — drop it into your existing pipeline:
result.model # nn.Module, ready for inference
result.accuracy # 0.9859
result.flops_reduction_pct # 46.5
result.arch # [301, 153, 128] ← hidden layer sizes found
Why dNATY?
The problem: most models ship larger than they need to be. That means slower inference, higher cloud bills, and models too heavy for edge devices (cameras, drones, robots). Shrinking them by hand is days of trial-and-error with no guarantee you found the best size/accuracy trade-off.
What you get with dNATY:
- Smaller, cheaper models — ~46% fewer FLOPs on MNIST, accuracy kept (98.59%)
- No GPU — the search runs on CPU in minutes, so it works in CI and on the edge hardware you already have
- No retraining — point it at a model + dataset, get a deployable
nn.Moduleback - One function call —
compress(model, dataset); export to.pth/.onnx
How is this different from pruning / quantization / distillation?
Those methods shrink the model you already have. dNATY searches for a smaller architecture that does the same job — a different layer here. They're complementary, not competing:
| Method | What it does | Catch |
|---|---|---|
| Quantization | Lower-precision weights (fp32→int8) | Same architecture & op count. Stack it on top of dNATY. |
| Pruning | Zeroes individual weights | Needs sparse runtimes to actually run faster; manual tuning |
| Distillation | Trains a small student model | You design the student + write the training loop |
| DARTS | Gradient-based architecture search | Needs a GPU + hours of config |
| Random NAS | Random architecture sampling | No memory — re-tries bad ideas |
| dNATY | Evolves a smaller architecture, memory-guided | CPU-only, one call, no retraining |
The engine is episodic memory-guided evolutionary search (NSGA-II, multi-objective): operators that helped in past generations get sampled more often, so it converges faster than random search — no gradients, no GPU.
Benchmark: dNATY vs alternatives
Results on MNIST (30K training samples, CPU, seed=42).
| Method | FLOPs reduction | Accuracy | Setup effort | GPU needed |
|---|---|---|---|---|
| dNATY | −46.5% | 98.59% | 1 function call | No |
| RandomNAS | −41.2% | 98.54% | 1 function call | No |
torch.nn.utils.prune |
−30–40%* | varies | manual per-layer | No |
| DARTS | −35–50% | varies | hours of config | Yes |
| Manual knowledge distillation | −20–60%* | varies | custom training loop | No |
* highly dependent on model and manual choices
Continual learning (Split-MNIST, 5 tasks, 3 seeds)
| Method | Backward Transfer (BWT) | Less forgetting |
|---|---|---|
| dNATY | −0.145 | best |
| EWC | −0.999 | near-total forgetting |
| MLP (no CL) | −0.998 | baseline |
dNATY achieves 6.9× less catastrophic forgetting than EWC.
All numbers reproducible: python scripts/prove_it.py
Measured across real datasets
Compression depends on how oversized your model is — dNATY finds the right size, it doesn't force a fixed cut. Measured on CPU (held-out accuracy):
| Dataset | FLOPs ↓ | Accuracy | Note |
|---|---|---|---|
| MNIST | −50.4% | 97.0% | oversized MLP → big cut |
| Fashion-MNIST | −54.6% | 86.4% | oversized MLP → big cut |
| UCI Wine Quality | −78.4% | 63.7% | extra capacity useless → shrinks hard |
| UCI Adult / Census | −2.7% | 84.0% | already lean → small cut (correct) |
| UCI Covertype | −1.5% | 78.1% | already lean → small cut (correct) |
| CIFAR-10 (MLP) | −1.2% | 46.4% | MLP unfit for RGB — conv NAS is WIP |
Full table, config, and reproduction: BENCHMARKS_REAL.md.
Real examples
MNIST — MLP compression
import torch.nn as nn
from dnaty import compress
from dnaty.experiments.fast_dataset import FastDataset
model = nn.Sequential(
nn.Flatten(),
nn.Linear(784, 512), nn.ReLU(),
nn.Linear(512, 256), nn.ReLU(),
nn.Linear(256, 10),
)
ds = FastDataset("MNIST", device="cpu", train_subset=30_000)
result = compress(model, ds, target_flops=0.5, n_generations=50, seed=42)
print(result.summary())
# FLOPs -46.5% (1,133,056 → 605,802) | acc=0.9859 | arch=[301, 153, 128]
CIFAR-10 — image classification
import torch.nn as nn
from dnaty import compress
from dnaty.experiments.fast_dataset import FastDataset
model = nn.Sequential(
nn.Flatten(),
nn.Linear(3072, 1024), nn.ReLU(),
nn.Linear(1024, 512), nn.ReLU(),
nn.Linear(512, 10),
)
ds = FastDataset("CIFAR10", device="cpu", train_subset=50_000)
result = compress(model, ds, target_flops=0.5, n_generations=30, seed=0)
print(result.summary())
# FLOPs reduction · +4.43 pp accuracy vs ResNet baseline
Custom DataLoader
dNATY works with any standard torch.utils.data.DataLoader:
from torch.utils.data import DataLoader, TensorDataset
import torch
X = torch.randn(5_000, 128)
y = torch.randint(0, 2, (5_000,))
loader = DataLoader(TensorDataset(X, y), batch_size=256, shuffle=True)
model = nn.Sequential(
nn.Linear(128, 256), nn.ReLU(),
nn.Linear(256, 128), nn.ReLU(),
nn.Linear(128, 2)
)
result = compress(model, loader, target_flops=0.4, n_generations=20)
Deterministic results with seed
result = compress(model, ds, target_flops=0.5, n_generations=30, seed=42)
# Run again with the same seed → identical result
API reference
compress(model, train_data, **kwargs) → CompressResult
| Parameter | Type | Default | Description |
|---|---|---|---|
model |
nn.Module |
required | Any model with nn.Linear layers |
train_data |
FastDataset or DataLoader |
required | Training data |
target_flops |
float |
0.5 |
Target FLOPs as fraction of original (0.5 = 50% less) |
n_generations |
int |
30 |
Evolutionary generations to run |
n_pop |
int |
15 |
Population size (diversity vs. speed) |
device |
str |
auto | 'cpu' or 'cuda' |
seed |
int |
None |
Fix for reproducibility |
verbose |
bool |
True |
Print generation-by-generation progress |
CompressResult
result.model # nn.Module — compressed model, ready for inference
result.accuracy # float — validation accuracy
result.flops_reduction # float — e.g. 0.465 = 46.5% fewer FLOPs
result.flops_reduction_pct # float — percentage version
result.params_reduction_pct # float — parameter reduction percentage
result.original_flops # int — FLOPs of the input model
result.compressed_flops # int — FLOPs of the compressed model
result.original_params # int — parameters of the input model
result.compressed_params # int — parameters of the compressed model
result.arch # list[int] — hidden layer sizes found
result.generations # int — generations that were run
result.summary() # str — one-line human-readable summary
FastDataset
Zero-overhead dataset loading — loads everything into RAM once, serves batches via direct indexing.
from dnaty.experiments.fast_dataset import FastDataset
ds = FastDataset(
name="MNIST", # "MNIST" | "FashionMNIST" | "CIFAR10"
device="cpu", # "cpu" or "cuda"
train_subset=10_000, # use a subset of training data (None = full)
val_size=10_000, # validation split size
data_dir="./data", # where to download/cache
)
DnatyEvolver (advanced)
Direct access to the evolutionary engine for custom search loops:
from dnaty.evolution.evolver import DnatyEvolver
evolver = DnatyEvolver(
n_pop=20,
n_generations=50,
input_size=784,
n_classes=10,
init_hidden=[512, 256],
device="cpu",
verbose=True,
)
evolver.run(train_data, val_data)
best = evolver.population[0]
print(best.model, best.acc, best.count_flops())
How it works
Initial architecture
│
▼
┌─────────────────────────────────────────┐
│ Population of N candidate architectures │
│ (mutations: add/remove neurons, merge │
│ layers, split, widen, narrow, skip) │
└──────────────┬──────────────────────────┘
│ each generation:
│
┌──────▼──────┐
│ Mutate │ ← episodic memory weights operator probabilities
└──────┬──────┘
│
┌──────▼──────┐
│ Train │ 3 epochs per candidate (AMP on GPU, fp32 on CPU)
└──────┬──────┘
│
┌──────▼──────┐
│ Select │ NSGA-II Pareto front: max acc + min FLOPs
└──────┬──────┘
│
┌──────▼──────┐
│ Remember │ operators that helped get higher probability next round
└─────────────┘
│
▼
Best compressed model
The episodic memory is dNATY's core differentiator. Unlike random search or gradient-based NAS, the search improves over generations by remembering what worked.
Installation
pip install dnaty # stable (recommended)
pip install dnaty==1.0.1 # pin to specific version
pip install git+https://github.com/pedrovergueiroo/dNATY # latest from source
Requirements: Python 3.10+, PyTorch 2.0+, NumPy 1.24+
Optional dev dependencies:
pip install dnaty[dev] # adds pytest, matplotlib, jupyter
Project structure
dNATY/
├── dnaty/
│ ├── compress.py # public API: compress()
│ ├── evolution/evolver.py # DnatyEvolver — main search loop
│ ├── core/
│ │ ├── arch.py # DynamicMLP — mutable architecture
│ │ └── individual.py # Individual = model + memory + fitness
│ ├── operators/mutations.py # 8 structural operators
│ ├── training/local_train.py # fast local trainer (AMP, FP32)
│ └── experiments/
│ └── fast_dataset.py # FastDataset — zero-I/O loader
├── dnaty_saas/ # Production API (FastAPI + PostgreSQL)
├── frontend/ # Web UI (React + TypeScript + Tailwind)
├── notebooks/ # CIFAR-100, ImageNet experiments
├── scripts/
│ ├── prove_it.py # reproduces all benchmark numbers
│ └── demo_compress.py # interactive demo
└── tests/ # pytest suite
Reproducing the benchmarks
# Full benchmark suite (~25 min on CPU)
python scripts/prove_it.py
# Quick demo (~5 min)
python scripts/demo_compress.py
# Run tests
pytest tests/
Results are written to results/ as JSON files.
SaaS API
dNATY ships with a production-ready API backend (FastAPI + PostgreSQL + Stripe).
cd dnaty_saas
cp .env.example .env # configure DATABASE_URL, JWT_SECRET, etc.
pip install -r requirements.txt
uvicorn main:app --reload
POST /api/v1/compress — submit a compression job
GET /api/v1/compress/{job_id} — poll status and get results
See /docs (Swagger) when the server is running.
License
Business Source License 1.1 — free for non-commercial use.
Contact pedrol.vergueiro@gmail.com for commercial licensing.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dnaty-1.1.0.tar.gz.
File metadata
- Download URL: dnaty-1.1.0.tar.gz
- Upload date:
- Size: 62.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c1aef0f73fa7ec192783e195d1032a792b8493c813b6bc6be06e9d3298cfb1b
|
|
| MD5 |
0f781a62b6adc365ef3a3ae9f848b1a0
|
|
| BLAKE2b-256 |
f4b5b9d6edd1f9794025e14c32a5f2772489e55aa19c45e3fa8fb64652e310e4
|
File details
Details for the file dnaty-1.1.0-py3-none-any.whl.
File metadata
- Download URL: dnaty-1.1.0-py3-none-any.whl
- Upload date:
- Size: 36.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d66e8023702c5d53e9eef312399240401c3c72c0c2c3ace81dc8c66fcb51af5a
|
|
| MD5 |
3891dcf467e1b59e7a4299eea7d6a8f7
|
|
| BLAKE2b-256 |
3ea65ae5b6e2d528fbfe15d6b925c284a671d2dbe10d0de5146846b2a1e88ffa
|