Evolutionary Neural Architecture Search -- compress any PyTorch model with one function call

These details have not been verified by PyPI

Project links

Project description

dNATY

Evolutionary AI Model Compression

8–83% fewer FLOPs (median −56% across 17 real datasets) · accuracy kept · no GPU required

Compress any PyTorch model with one function call. dNATY uses multi-objective evolutionary search (NSGA-II) guided by episodic memory to find smaller, faster architectures — automatically, on a standard CPU.

pip install dnaty

Website · Docs · Benchmarks · Changelog

Quickstart

import torch.nn as nn
import dnaty
from dnaty import compress
from dnaty.experiments.fast_dataset import FastDataset

# 1. Your model — any nn.Module with Linear layers
model = nn.Sequential(
    nn.Flatten(),
    nn.Linear(784, 512), nn.ReLU(),
    nn.Linear(512, 256), nn.ReLU(),
    nn.Linear(256, 10)
)

# 2. Load dataset (cached in RAM — zero I/O across generations)
ds = FastDataset("MNIST", device="cpu", train_subset=10_000)

# 3. Compress
result = compress(model, ds, target_flops=0.5, n_generations=30)

print(result.summary())
# CompressResult | arch=[...] | FLOPs -44% (1,133,056 -> ~633K) | acc=0.977
# (exact numbers vary by seed/subset; run scripts/prove_it.py to reproduce)

The compressed model is a regular nn.Module — drop it into your existing pipeline:

result.model                  # nn.Module, ready for inference
result.accuracy               # 0.977  (example; varies by seed/subset)
result.flops_reduction_pct    # 44.1   (example)
result.arch                   # [301, 226, 32, 128]  ← hidden layer sizes found

# Save / reload
result.save("compressed.pt")
result = dnaty.load("compressed.pt")

# Export to ONNX for edge deployment (no PyTorch needed on the device)
result.export_onnx("model.onnx", input_shape=(784,))

# Measure real CPU latency on your machine
print(result.benchmark_latency((784,)))   # p50/p95/p99 ms + fps

Why dNATY?

The problem: most models ship larger than they need to be. That means slower inference, higher cloud bills, and models too heavy for edge devices (cameras, drones, robots, industrial boxes). Shrinking them by hand is days of trial-and-error with no guarantee you found the best size/accuracy trade-off.

What you get with dNATY:

Smaller, cheaper models — 8–83% fewer FLOPs across 18 real datasets, accuracy kept
No GPU — the search runs on CPU in minutes, so it works in CI and on the hardware you already have
No manual architecture design — point it at a model + dataset, get a deployable nn.Module back
One function call — compress(model, dataset); export to .pt / .onnx

"Why not just TensorRT or TFLite?" — wrong layer.

Runtimes optimize execution of a fixed architecture. dNATY optimizes the architecture itself, upstream of any runtime. You don't choose between them — you chain them: compress() → export_onnx() → load into TensorRT / TFLite / ONNX Runtime. The savings stack.

Versus other compression techniques

Method	What it does	Catch
Quantization	Lower-precision weights (fp32→int8)	Same architecture & op count. Stack it on top of dNATY.
Pruning	Zeroes individual weights	Needs sparse runtimes to actually run faster; manual tuning
Distillation	Trains a small student model	You design the student + write the training loop
DARTS	Gradient-based architecture search	Needs a GPU + hours of config
Random NAS	Random architecture sampling	No memory — re-tries bad ideas
dNATY	Evolves a smaller architecture, memory-guided	CPU-only, one call

The engine is episodic memory-guided evolutionary search: operators that helped in past generations get sampled more often, so it consistently finds better compression than random NAS at the same generation budget — no gradients, no GPU.

Measured results

All numbers measured on a standard desktop CPU, validation accuracy on a held-out 20% split, reproducible from scripts in this repo. Full tables, configs, and caveats: dnaty.org/benchmarks.

13 public datasets (n_generations=30, n_pop=15) — top rows:

Dataset	Samples	FLOPs ↓	Val acc	Domain
Electrical Fault Detect	12,001	−83.0%	99.25%	smart grid sensors
Dry Bean Quality	13,611	−83.4%	92.43%	agricultural IoT
Predictive Maint. (AI4I)	10,000	−83.1%	96.70%	factory IoT
Breast Cancer (UCI)	569	−72.6%	99.56%	clinical tabular
Credit Card Fraud (full)	284,807	−64.0%	99.96%	financial anomaly
Network Intrusion (NSL-KDD)	31,490	−56.3%	99.46%	edge security
HAR Sensors (UCI)	10,299	−46.8%	99.17%	wearables · robotics
MNIST (full 70K)	70,000	−41.8%	98.68%	vision · digits

5 public Kaggle datasets — different sizes, domains, and feature counts. Reproduce: python scripts/benchmark_market_real.py (downloads from Kaggle, ~2 h on CPU).

Dataset	Rows	Features	FLOPs ↓	Val acc	Domain
IBM HR Employee Attrition	1,470	51	−75.5%	99.3%	HR / corporate
Adult Census Income	32,561	104	−74.4%	90.4%	social / financial
Air Quality (UCI)	7,674	12	−45.0%	91.2%	environmental sensors
Diabetes 130-US Hospitals	101,766	119	−8.0%	89.3%	clinical / hospital
Telco Customer Churn	7,043	45	+20% ⚠	93.2%	telecom

4 of 5 compressed (median −45%). The Telco case: NSGA-II explored deeper rather than narrower from the [512, 256, 128] baseline — model_grew=True was raised automatically. Passing a wider baseline ([1024, 512, 256]) or increasing target_flops resolves it. This is expected Pareto behavior, not a silent failure.

Compression scales with how oversized the baseline is — dNATY finds the right size, it doesn't force a fixed cut. Lean models get small cuts (correct Pareto behavior); the library warns explicitly when nothing was cut.

Continual learning — 3 benchmarks, 3 seeds each

Split-MNIST (5 tasks, digit pairs) — proof of concept:

Method	BWT (↑ better)
dNATY (balanced replay)	−0.204	~5× less forgetting vs EWC
EWC (λ=400)	−0.998	near-total forgetting
MLP (no CL)	−0.998	baseline

Permuted-MNIST (10 tasks, domain-incremental) and Split-CIFAR-10 (5 tasks, class-incremental) are harder benchmarks with results in results/exp4_* and results/exp5_*. Note: Split-MNIST is a weak benchmark — it is included for comparability with prior work. The harder benchmarks are the primary CL evidence. Full methodology and comparisons to ER-ACE/MAML: METHODOLOGY.md.

Reproduce: python scripts/prove_it.py (NAS vs random) · python -m dnaty.experiments.exp3_cl (Split-MNIST) · python -m dnaty.experiments.exp4_permuted_mnist (Permuted-MNIST) · python -m dnaty.experiments.exp5_split_cifar10 (Split-CIFAR-10)

API at a glance

You want to…	Use
Compress a tabular/sensor MLP	`compress(model, data, target_flops=0.5)`
Compress a small CNN trained from scratch	`compress_cnn(model, loader)` (early access — CIFAR-scale classification)
Compress the head of a pretrained backbone	`compress_with_backbone(resnet, loader, finetune_backbone=True)`
Thin out conv layers too	`prune_conv_channels(model, amount=0.3)`
Deploy without PyTorch on the device	`result.export_onnx("m.onnx", input_shape=...)`
Save / reload	`result.save("m.pt")` / `dnaty.load("m.pt")`
Detect data drift in production	`DriftDetector().fit(X_train)` + `ProductionTracker(model, detector)`
Profile compute before deciding	`count_flops(model, input_shape)` / `flops_by_layer(...)`
Get the full accuracy/FLOPs trade-off curve	`result.pareto_front` / `result.pareto_front_csv("front.csv")`
Reuse search knowledge on a related task	`result.save_memory("prior.json")` → `compress(..., warm_start="prior.json")`

Supported backbones for compress_with_backbone: ResNet, MobileNetV2/V3, EfficientNet, VGG, DenseNet, ViT, and custom models with an fc/classifier/head attribute.

Full reference with copy-paste recipes: dnaty.org/docs

Example — pretrained backbone for edge deployment

import torchvision.models as tv
import dnaty

backbone = tv.mobilenet_v2(weights="IMAGENET1K_V1")
dnaty.prune_conv_channels(backbone, amount=0.2)          # optional: thin convs first

result = dnaty.compress_with_backbone(
    backbone, train_loader,
    target_flops=0.4,
    finetune_backbone=True, finetune_epochs=10,
)
result.export_onnx("mobilenet_edge.onnx", input_shape=(3, 224, 224))

Deterministic results

result = compress(model, ds, target_flops=0.5, n_generations=30, seed=42)
# Same seed → identical result. The pytest suite gates every release on this.

New in 2.1.0

The full Pareto front, not just one model

compress() returns a single winner, but the search explores a whole accuracy/FLOPs trade-off curve. That curve is now exposed — pick the point that fits your device budget instead of accepting one pre-chosen size:

result = compress(model, ds, target_flops=0.5, n_generations=30)

print(result.pareto_summary())
# Pareto front — 6 non-dominated architecture(s):
#   arch=[196, 29, 128]  acc=0.9731  FLOPs=27,400 (-64.2%)  params=14,762
#   arch=[224, 32, 16, 128]  acc=0.9788  FLOPs=29,184 (-61.9%)  params=15,795
#   ...

result.pareto_front           # list of {arch, accuracy, flops, params, flops_reduction_pct}
result.pareto_front_csv("front.csv")   # plot the trade-off curve in your paper

Front accuracies are eval-mode, NAS-phase (un-fine-tuned) numbers for choosing an operating point; the returned result.model is the fine-tuned winner.

Transferable memory — the search learns your domain

dNATY's engine is episodic memory: it learns which structural mutations help. In 2.1.0 that learned operator prior is transferable. Save it after one run and warm-start the next related task — the search starts biased toward what already worked instead of re-discovering it, so it converges in fewer generations:

# Task A — learn a prior on your first sensor model
r1 = compress(model_a, data_a, target_flops=0.5)
r1.save_memory("sensor_prior.json")

# Task B — a related model/dataset in the same domain, warm-started
r2 = compress(model_b, data_b, target_flops=0.5, warm_start="sensor_prior.json")

# or chain in-process, no disk:
r2 = compress(model_b, data_b, warm_start=r1.export_memory())

warm_start_weight (default 2.0) controls how decisively the prior biases early generations — 0 ignores it, ≥4 is aggressive. The prior fades as task-specific evidence accumulates, so it is a head start, not a cage.

Measure the speedup yourself: python scripts/warm_start_demo.py runs cold vs warm-started on two related tasks and reports generations-to-target. This is operator-prior transfer between related MLP/tabular tasks — not a claim about ImageNet-scale conv search.

Scope, stated plainly

Strong: MLPs on tabular/sensor data; classifier heads on frozen CNN/ViT backbones; CPU-only environments. Not yet: full convolutional NAS end-to-end (under development — convs are handled by structural pruning today); transformer/LLM compression; models that are already minimal (no fat → little or no cut, and the library warns you when the model would need to grow).

No comparison against OFA or MnasNet is claimed — those target full conv search spaces on GPUs; dNATY targets CPU-only workflows on a different problem slice.

Installation

pip install dnaty                # stable (recommended)
pip install dnaty==2.1.0         # pin to this release
pip install git+https://github.com/pedrovergueiro/dNaty  # latest from source

Requirements: Python 3.10+, PyTorch 2.0+, NumPy 1.24+

pip install dnaty[dev]   # adds pytest, matplotlib, jupyter

Project structure

dNaty/
├── dnaty/
│   ├── compress.py              # public API: compress, compress_cnn,
│   │                            #   compress_with_backbone, prune_conv_channels
│   ├── result.py                # CompressResult (+ Pareto front, save_memory) / load()
│   ├── evolution/evolver.py     # DnatyEvolver / CnnEvolver — NSGA-II search (+ warm_start)
│   ├── core/                    # DynamicMLP, DynamicCNN, Individual, episodic memory + priors
│   ├── operators/               # structural mutation operators (dense + conv)
│   ├── training/local_train.py  # fast local trainer
│   ├── monitoring/              # DriftDetector, ProductionTracker
│   ├── utils/flops_counter.py   # count_flops, flops_by_layer
│   └── experiments/fast_dataset.py  # zero-I/O MNIST/FashionMNIST/CIFAR10 loader
├── scripts/                     # prove_it.py, warm_start_demo.py, benchmark_market_real.py, ...
└── tests/                       # pytest suite (142 tests) — gates every release

Hosted version

Prefer not to run it locally? dnaty.org hosts the same engine with a web UI and REST API — upload a CSV, get a compressed model back. Free tier: 1 training a day, no card.

Citation

@software{vergueiro_dnaty_2026,
  author  = {Vergueiro, Pedro},
  title   = {dNaty: Dynamic Neuro-Adaptive sYstem with evoluTionarY Learning},
  year    = {2026},
  url     = {https://github.com/pedrovergueiro/dNaty},
  version = {2.1.0},
  license = {BSL-1.1}
}

License

Business Source License 1.1 — free for research, academic work, and personal projects. Commercial use requires a license: dnaty.org/commercial · pedrol.vergueiro@gmail.com

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.1.0

Jul 23, 2026

2.0.3

Jul 19, 2026

2.0.1

Jul 1, 2026

2.0.0

Jul 1, 2026

1.1.7

Jun 15, 2026

1.1.6

Jun 11, 2026

1.1.5

Jun 11, 2026

1.1.4

Jun 7, 2026

1.1.3

Jun 7, 2026

1.1.2

Jun 7, 2026

1.1.1

Jun 6, 2026

1.1.0

Jun 5, 2026

1.0.1

May 29, 2026

1.0.0

May 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dnaty-2.1.0.tar.gz (124.1 kB view details)

Uploaded Jul 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dnaty-2.1.0-py3-none-any.whl (94.1 kB view details)

Uploaded Jul 23, 2026 Python 3

File details

Details for the file dnaty-2.1.0.tar.gz.

File metadata

Download URL: dnaty-2.1.0.tar.gz
Upload date: Jul 23, 2026
Size: 124.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for dnaty-2.1.0.tar.gz
Algorithm	Hash digest
SHA256	`fe302b0245250a2b9c53bad7e9ce4b247d8f74f7629485c4c7095fa32de0809c`
MD5	`48004f6748188f38470fd91534a64df2`
BLAKE2b-256	`84280e6455b224880eab337f168e8b013f58c5857bb32823fcf95bc9e77a05f5`

See more details on using hashes here.

File details

Details for the file dnaty-2.1.0-py3-none-any.whl.

File metadata

Download URL: dnaty-2.1.0-py3-none-any.whl
Upload date: Jul 23, 2026
Size: 94.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for dnaty-2.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`900ac14be8c5dccaa9e8fcd15505be00d738670815639e2a8c617f2755428be7`
MD5	`91b2a0284032a1ada3a2fc0b7e153ef5`
BLAKE2b-256	`af7dbc0ffbfc39bd7ab79bcee75b94fa73e6e44031f691a545cb66379acdd74b`

See more details on using hashes here.

dnaty 2.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

dNATY

Evolutionary AI Model Compression

Quickstart

Why dNATY?

"Why not just TensorRT or TFLite?" — wrong layer.

Versus other compression techniques

Measured results

API at a glance

Example — pretrained backbone for edge deployment

Deterministic results

New in 2.1.0

The full Pareto front, not just one model

Transferable memory — the search learns your domain

Scope, stated plainly

Installation

Project structure

Hosted version

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes