Skip to main content

Adaptive Neural Execution Engine – Dynamic sparse inference for pre-trained Transformers.

Project description

ANEE v0.3 — Adaptive Neural Execution Engine

Dynamic Sparse Inference for Pre-Trained Transformers

ANEE is a lightweight framework for token-wise, layer-wise adaptive computation in transformer language models. Instead of running every layer for every token, ANEE learns how to allocate compute dynamically, reducing unnecessary computation while preserving output quality.

ANEE wraps existing HuggingFace models (e.g., GPT-2) without modifying their weights.


🔧 Key Capabilities

• Dynamic Layer Skipping

ANEE evaluates each transformer block at inference time and decides whether to:

  • PROCESS — run full attention + MLP
  • SKIP — bypass computation for that layer
  • EXIT — terminate further processing (supported)

This produces sparse execution patterns that vary across tokens.


• RL-Trained Controller

A small neural controller receives a per-layer state vector containing:

  • entropy of logits
  • hidden-state norms
  • delta-norms
  • variance
  • layer position
  • remaining budget

It learns policies via:

  1. Supervised warm-start (from heuristic traces)

  2. Reinforcement learning with a reward balancing:

    • similarity to full model (KL divergence)
    • compute savings
    • budget adherence

• Budget-Aware Inference

Users provide an energy_budget in [0,1]. The controller adjusts its behavior per token to meet the budget target while maintaining model output quality.


• Visual Execution Maps

ANEE includes tooling to visualize:

  • token-by-layer skip/process patterns
  • per-token compute usage
  • overall savings
  • effective depth profiles

These “execution heatmaps” help interpret which layers the model relies on.


• Model-Agnostic Design

The wrapper manually unrolls transformer layers and is structured for easy adaptation to other decoder-only architectures beyond GPT-2.


📦 Repository Structure

anee/
│
├── wrapper.py              # Core dynamic execution engine
├── controller.py           # Heuristic + learned controllers
├── profiler.py             # Layer-level state feature extractor
├── reward.py               # RL reward (quality + efficiency)
├── utils.py                # FLOPs proxy utilities
├── config.py               # ANEE configuration
│
├── experiments/
│   ├── train_controller.py
│   ├── train_controller_rl.py
│   ├── collect_traces.py
│   ├── 01_sanity_check.py
│   ├── visualize_heatmap.py

🚀 Getting Started

Install

pip install -e .

Warm-start Controller

python experiments/train_controller.py

RL Fine-Tuning

python experiments/train_controller_rl.py

Quick Test

python experiments/01_sanity_check.py

Generate Heatmap Visualization

python experiments/visualize_heatmap.py

📈 Performance Snapshot (GPT-2 Small)

At moderate budgets, ANEE typically:

  • executes ~6–9 of 12 layers per token
  • achieves ~20–30% effective compute reduction
  • maintains coherent generation
  • shows consistent “sparse middle, dense edges” execution profiles

Lower budgets naturally trade off output quality.


🔬 Intended Use & Applications

ANEE provides a clean, transparent platform for research in:

  • dynamic depth / adaptive inference
  • efficient transformer execution
  • compute-aware LLM routing
  • per-token sparsity patterns
  • RL-driven execution policies

It is well-suited for experimentation, teaching, and further development.


📄 License

APACHE 2.0


📚 Citation

@software{ANEE,
  author = {Ahmed Bin Khalid},
  title  = {ANEE: Adaptive Neural Execution Engine},
  year   = {2025},
  note   = {Dynamic compute allocation for transformer inference},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anee-0.3.tar.gz (20.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anee-0.3-py3-none-any.whl (19.4 kB view details)

Uploaded Python 3

File details

Details for the file anee-0.3.tar.gz.

File metadata

  • Download URL: anee-0.3.tar.gz
  • Upload date:
  • Size: 20.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for anee-0.3.tar.gz
Algorithm Hash digest
SHA256 9737b07187a1e49de0b0de5b6e0fb10d2710a6bfe58a817c38a3cf07c50517d1
MD5 9a7fd58c08f7ab46264746c6ae89f2eb
BLAKE2b-256 7d03c382ef117550b3d79437d0cfb76a124ba79fd90388237027734431f35e26

See more details on using hashes here.

File details

Details for the file anee-0.3-py3-none-any.whl.

File metadata

  • Download URL: anee-0.3-py3-none-any.whl
  • Upload date:
  • Size: 19.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for anee-0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 509b7b152330859c565d7ff04bc2f03322fe13fe4a70312eddadcf9ff350863d
MD5 2b0a3e465a8e9fab51aa92dff9cde4bb
BLAKE2b-256 81d3e162650eab787043c27870b8606da9e1b9678d8a812b79b2106eb219542d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page