Adaptive Neural Execution Engine – Dynamic sparse inference for pre-trained Transformers.

Project description

ANEE v0.3 — Adaptive Neural Execution Engine

Dynamic Sparse Inference for Pre-Trained Transformers

ANEE is a lightweight framework for token-wise, layer-wise adaptive computation in transformer language models. Instead of running every layer for every token, ANEE learns how to allocate compute dynamically, reducing unnecessary computation while preserving output quality.

ANEE wraps existing HuggingFace models (e.g., GPT-2) without modifying their weights.

🔧 Key Capabilities

• Dynamic Layer Skipping

ANEE evaluates each transformer block at inference time and decides whether to:

PROCESS — run full attention + MLP
SKIP — bypass computation for that layer
EXIT — terminate further processing (supported)

This produces sparse execution patterns that vary across tokens.

• RL-Trained Controller

A small neural controller receives a per-layer state vector containing:

entropy of logits
hidden-state norms
delta-norms
variance
layer position
remaining budget

It learns policies via:

Supervised warm-start (from heuristic traces)
Reinforcement learning with a reward balancing:
- similarity to full model (KL divergence)
- compute savings
- budget adherence

• Budget-Aware Inference

Users provide an energy_budget in [0,1]. The controller adjusts its behavior per token to meet the budget target while maintaining model output quality.

• Visual Execution Maps

ANEE includes tooling to visualize:

token-by-layer skip/process patterns
per-token compute usage
overall savings
effective depth profiles

These “execution heatmaps” help interpret which layers the model relies on.

• Model-Agnostic Design

The wrapper manually unrolls transformer layers and is structured for easy adaptation to other decoder-only architectures beyond GPT-2.

📦 Repository Structure

anee/
│
├── wrapper.py              # Core dynamic execution engine
├── controller.py           # Heuristic + learned controllers
├── profiler.py             # Layer-level state feature extractor
├── reward.py               # RL reward (quality + efficiency)
├── utils.py                # FLOPs proxy utilities
├── config.py               # ANEE configuration
│
├── experiments/
│   ├── train_controller.py
│   ├── train_controller_rl.py
│   ├── collect_traces.py
│   ├── 01_sanity_check.py
│   ├── visualize_heatmap.py

🚀 Getting Started

Install

pip install -e .

Warm-start Controller

python experiments/train_controller.py

RL Fine-Tuning

python experiments/train_controller_rl.py

Quick Test

python experiments/01_sanity_check.py

Generate Heatmap Visualization

python experiments/visualize_heatmap.py

📈 Performance Snapshot (GPT-2 Small)

At moderate budgets, ANEE typically:

executes ~6–9 of 12 layers per token
achieves ~20–30% effective compute reduction
maintains coherent generation
shows consistent “sparse middle, dense edges” execution profiles

Lower budgets naturally trade off output quality.

🔬 Intended Use & Applications

ANEE provides a clean, transparent platform for research in:

dynamic depth / adaptive inference
efficient transformer execution
compute-aware LLM routing
per-token sparsity patterns
RL-driven execution policies

It is well-suited for experimentation, teaching, and further development.

📄 License

APACHE 2.0

📚 Citation

@software{ANEE,
  author = {Ahmed Bin Khalid},
  title  = {ANEE: Adaptive Neural Execution Engine},
  year   = {2025},
  note   = {Dynamic compute allocation for transformer inference},
}

Project details

Release history Release notifications | RSS feed

0.4

Nov 27, 2025

This version

0.3

Nov 27, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anee-0.3.tar.gz (20.9 kB view details)

Uploaded Nov 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

anee-0.3-py3-none-any.whl (19.4 kB view details)

Uploaded Nov 27, 2025 Python 3

File details

Details for the file anee-0.3.tar.gz.

File metadata

Download URL: anee-0.3.tar.gz
Upload date: Nov 27, 2025
Size: 20.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for anee-0.3.tar.gz
Algorithm	Hash digest
SHA256	`9737b07187a1e49de0b0de5b6e0fb10d2710a6bfe58a817c38a3cf07c50517d1`
MD5	`9a7fd58c08f7ab46264746c6ae89f2eb`
BLAKE2b-256	`7d03c382ef117550b3d79437d0cfb76a124ba79fd90388237027734431f35e26`

See more details on using hashes here.

File details

Details for the file anee-0.3-py3-none-any.whl.

File metadata

Download URL: anee-0.3-py3-none-any.whl
Upload date: Nov 27, 2025
Size: 19.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for anee-0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`509b7b152330859c565d7ff04bc2f03322fe13fe4a70312eddadcf9ff350863d`
MD5	`2b0a3e465a8e9fab51aa92dff9cde4bb`
BLAKE2b-256	`81d3e162650eab787043c27870b8606da9e1b9678d8a812b79b2106eb219542d`

See more details on using hashes here.

anee 0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

ANEE v0.3 — Adaptive Neural Execution Engine

🔧 Key Capabilities

• Dynamic Layer Skipping

• RL-Trained Controller

• Budget-Aware Inference

• Visual Execution Maps

• Model-Agnostic Design

📦 Repository Structure

🚀 Getting Started

Install

Warm-start Controller

RL Fine-Tuning

Quick Test

Generate Heatmap Visualization

📈 Performance Snapshot (GPT-2 Small)

🔬 Intended Use & Applications

📄 License

📚 Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes