Adaptive Neural Execution Engine – Dynamic sparse inference for pre-trained Transformers.

Project description

ANEE v0.4 — Adaptive Neural Execution Engine

Dynamic Sparse Inference for Pre-Trained Transformers

ANEE is a lightweight framework for token-wise, layer-wise adaptive computation in transformer language models. Instead of running every layer for every token, ANEE learns how to allocate compute dynamically, reducing unnecessary computation while preserving output quality.

ANEE wraps existing HuggingFace models (e.g., GPT-2) without modifying their weights.

🔧 Key Capabilities

• Dynamic Layer Skipping

ANEE evaluates each transformer block at inference time and decides whether to:

PROCESS — run full attention + MLP
SKIP — bypass computation for that layer
EXIT — terminate further processing (supported)

This produces sparse execution patterns that vary across tokens.

• RL-Trained Controller

A small neural controller receives a per-layer state vector containing:

entropy of logits
hidden-state norms
delta-norms
variance
layer position
remaining budget

It learns policies via:

Supervised warm-start (from heuristic traces)
Reinforcement learning with a reward balancing:
- similarity to full model (KL divergence)
- compute savings
- budget adherence

• Budget-Aware Inference

Users provide an energy_budget in [0,1]. The controller adjusts its behavior per token to meet the budget target while maintaining model output quality.

• Visual Execution Maps

ANEE includes tooling to visualize:

token-by-layer skip/process patterns
per-token compute usage
overall savings
effective depth profiles

These “execution heatmaps” help interpret which layers the model relies on.

• Model-Agnostic Design

The wrapper manually unrolls transformer layers and is structured for easy adaptation to other decoder-only architectures beyond GPT-2.

🚀 Getting Started

Install

pip install -e .

Quick Start


from anee.wrapper import ANEEWrapper
from anee.config import ANEEConfig
from transformers import GPT2TokenizerFast

tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")

config = ANEEConfig(
    model_name="gpt2",
    energy_budget=0.5,
    controller_type="learned",
    controller_path=None     # optional path to controller weights
)

model = ANEEWrapper(config)

prompt = "The future of artificial intelligence is"
output = model.generate(tokenizer, prompt, max_new_tokens=20)

print(output)

###Configuration


ANEEConfig(
    model_name="gpt2",
    energy_budget=1.0,        # [0,1], lower = more skipping
    min_layers=2,             # always executed
    exit_budget_threshold=0.1,
    controller_type="heuristic",  # or "learned"
    state_dim=6,
    controller_hidden_dim=32,
    controller_path=None
)

##🧠 How ANEE Works

For each token, ANEE:

Profiles the hidden states using:
- entropy
- max softmax probability
- hidden-state L2 norm
- delta-norm between layers
- activation variance
- remaining budget
Builds a state vector from these metrics.
Passes the state into a small MLP controller which decides:
- "PROCESS" → run the full transformer layer
- "SKIP" → update only the KV-cache, skip heavy compute
- "EXIT" → end early (optional, off by default)
Maintains safe KV-cache alignment even when skipping.
Produces logits through the model’s final layernorm + LM head.

📈 Performance Snapshot (GPT-2 Small)

At moderate budgets, ANEE typically:

executes ~6–9 of 12 layers per token
achieves ~20–30% effective compute reduction
maintains coherent generation
shows consistent “sparse middle, dense edges” execution profiles

Lower budgets naturally trade off output quality.

🔬 Intended Use & Applications

ANEE provides a clean, transparent platform for research in:

dynamic depth / adaptive inference
efficient transformer execution
compute-aware LLM routing
per-token sparsity patterns
RL-driven execution policies

It is well-suited for experimentation, teaching, and further development.

📄 License

APACHE 2.0

Citation

If you use ANEE in your research, please cite:

Ahmed Bin Khalid. (2025). ANEE: Adaptive Neural Execution Engine. Zenodo.
DOI: https://doi.org/10.5281/zenodo.17741880

@software{anee,
  author       = {Ahmed Bin Khalid},
  title        = {ANEE: Adaptive Neural Execution Engine},
  year         = {2025},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.17741880},
  url          = {https://doi.org/10.5281/zenodo.17741880}
}

Project details

Release history Release notifications | RSS feed

This version

0.4

Nov 27, 2025

0.3

Nov 27, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anee-0.4.tar.gz (21.6 kB view details)

Uploaded Nov 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

anee-0.4-py3-none-any.whl (19.9 kB view details)

Uploaded Nov 27, 2025 Python 3

File details

Details for the file anee-0.4.tar.gz.

File metadata

Download URL: anee-0.4.tar.gz
Upload date: Nov 27, 2025
Size: 21.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for anee-0.4.tar.gz
Algorithm	Hash digest
SHA256	`901a729435cb7939b184feae8c6794854ba184cc46867d7b44943fa7889d4fad`
MD5	`c0adf23e64094fd4dd83f7470ecab20a`
BLAKE2b-256	`1359097aba3498b72550d05848dbe11e325c9f955b06a43c9c5d225dc732672e`

See more details on using hashes here.

File details

Details for the file anee-0.4-py3-none-any.whl.

File metadata

Download URL: anee-0.4-py3-none-any.whl
Upload date: Nov 27, 2025
Size: 19.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for anee-0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c301c07a6ef0e1654e881a848c274cce5782e60e1a9cafea0684dfb0ae6fb61a`
MD5	`e84e9544affe50df3542ab99e4ab234c`
BLAKE2b-256	`bc58e169a41749875cc796b4d57859b23eb4f7106d688bde06e9e66ad713db58`

See more details on using hashes here.

anee 0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

ANEE v0.4 — Adaptive Neural Execution Engine

🔧 Key Capabilities

• Dynamic Layer Skipping

• RL-Trained Controller

• Budget-Aware Inference

• Visual Execution Maps

• Model-Agnostic Design

🚀 Getting Started

Install

Quick Start

📈 Performance Snapshot (GPT-2 Small)

🔬 Intended Use & Applications

📄 License

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes