Adaptive Neural Execution Engine – Dynamic sparse inference for pre-trained Transformers.
Project description
ANEE v0.4 — Adaptive Neural Execution Engine
Dynamic Sparse Inference for Pre-Trained Transformers
ANEE is a lightweight framework for token-wise, layer-wise adaptive computation in transformer language models. Instead of running every layer for every token, ANEE learns how to allocate compute dynamically, reducing unnecessary computation while preserving output quality.
ANEE wraps existing HuggingFace models (e.g., GPT-2) without modifying their weights.
🔧 Key Capabilities
• Dynamic Layer Skipping
ANEE evaluates each transformer block at inference time and decides whether to:
- PROCESS — run full attention + MLP
- SKIP — bypass computation for that layer
- EXIT — terminate further processing (supported)
This produces sparse execution patterns that vary across tokens.
• RL-Trained Controller
A small neural controller receives a per-layer state vector containing:
- entropy of logits
- hidden-state norms
- delta-norms
- variance
- layer position
- remaining budget
It learns policies via:
-
Supervised warm-start (from heuristic traces)
-
Reinforcement learning with a reward balancing:
- similarity to full model (KL divergence)
- compute savings
- budget adherence
• Budget-Aware Inference
Users provide an energy_budget in [0,1].
The controller adjusts its behavior per token to meet the budget target while maintaining model output quality.
• Visual Execution Maps
ANEE includes tooling to visualize:
- token-by-layer skip/process patterns
- per-token compute usage
- overall savings
- effective depth profiles
These “execution heatmaps” help interpret which layers the model relies on.
• Model-Agnostic Design
The wrapper manually unrolls transformer layers and is structured for easy adaptation to other decoder-only architectures beyond GPT-2.
🚀 Getting Started
Install
pip install -e .
Quick Start
from anee.wrapper import ANEEWrapper
from anee.config import ANEEConfig
from transformers import GPT2TokenizerFast
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
config = ANEEConfig(
model_name="gpt2",
energy_budget=0.5,
controller_type="learned",
controller_path=None # optional path to controller weights
)
model = ANEEWrapper(config)
prompt = "The future of artificial intelligence is"
output = model.generate(tokenizer, prompt, max_new_tokens=20)
print(output)
###Configuration
ANEEConfig(
model_name="gpt2",
energy_budget=1.0, # [0,1], lower = more skipping
min_layers=2, # always executed
exit_budget_threshold=0.1,
controller_type="heuristic", # or "learned"
state_dim=6,
controller_hidden_dim=32,
controller_path=None
)
##🧠 How ANEE Works
For each token, ANEE:
-
Profiles the hidden states using:
- entropy
- max softmax probability
- hidden-state L2 norm
- delta-norm between layers
- activation variance
- remaining budget
-
Builds a state vector from these metrics.
-
Passes the state into a small MLP controller which decides:
- "PROCESS" → run the full transformer layer
- "SKIP" → update only the KV-cache, skip heavy compute
- "EXIT" → end early (optional, off by default)
-
Maintains safe KV-cache alignment even when skipping.
-
Produces logits through the model’s final layernorm + LM head.
📈 Performance Snapshot (GPT-2 Small)
At moderate budgets, ANEE typically:
- executes ~6–9 of 12 layers per token
- achieves ~20–30% effective compute reduction
- maintains coherent generation
- shows consistent “sparse middle, dense edges” execution profiles
Lower budgets naturally trade off output quality.
🔬 Intended Use & Applications
ANEE provides a clean, transparent platform for research in:
- dynamic depth / adaptive inference
- efficient transformer execution
- compute-aware LLM routing
- per-token sparsity patterns
- RL-driven execution policies
It is well-suited for experimentation, teaching, and further development.
📄 License
APACHE 2.0
Citation
If you use ANEE in your research, please cite:
Ahmed Bin Khalid. (2025). ANEE: Adaptive Neural Execution Engine. Zenodo.
DOI: https://doi.org/10.5281/zenodo.17741880
@software{anee,
author = {Ahmed Bin Khalid},
title = {ANEE: Adaptive Neural Execution Engine},
year = {2025},
publisher = {Zenodo},
doi = {10.5281/zenodo.17741880},
url = {https://doi.org/10.5281/zenodo.17741880}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file anee-0.4.tar.gz.
File metadata
- Download URL: anee-0.4.tar.gz
- Upload date:
- Size: 21.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
901a729435cb7939b184feae8c6794854ba184cc46867d7b44943fa7889d4fad
|
|
| MD5 |
c0adf23e64094fd4dd83f7470ecab20a
|
|
| BLAKE2b-256 |
1359097aba3498b72550d05848dbe11e325c9f955b06a43c9c5d225dc732672e
|
File details
Details for the file anee-0.4-py3-none-any.whl.
File metadata
- Download URL: anee-0.4-py3-none-any.whl
- Upload date:
- Size: 19.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c301c07a6ef0e1654e881a848c274cce5782e60e1a9cafea0684dfb0ae6fb61a
|
|
| MD5 |
e84e9544affe50df3542ab99e4ab234c
|
|
| BLAKE2b-256 |
bc58e169a41749875cc796b4d57859b23eb4f7106d688bde06e9e66ad713db58
|