Skip to main content

Adaptive Neural Execution Engine – Dynamic sparse inference for pre-trained Transformers.

Project description

ANEE v0.4 — Adaptive Neural Execution Engine

Dynamic Sparse Inference for Pre-Trained Transformers

ANEE is a lightweight framework for token-wise, layer-wise adaptive computation in transformer language models. Instead of running every layer for every token, ANEE learns how to allocate compute dynamically, reducing unnecessary computation while preserving output quality.

ANEE wraps existing HuggingFace models (e.g., GPT-2) without modifying their weights.


🔧 Key Capabilities

• Dynamic Layer Skipping

ANEE evaluates each transformer block at inference time and decides whether to:

  • PROCESS — run full attention + MLP
  • SKIP — bypass computation for that layer
  • EXIT — terminate further processing (supported)

This produces sparse execution patterns that vary across tokens.


• RL-Trained Controller

A small neural controller receives a per-layer state vector containing:

  • entropy of logits
  • hidden-state norms
  • delta-norms
  • variance
  • layer position
  • remaining budget

It learns policies via:

  1. Supervised warm-start (from heuristic traces)

  2. Reinforcement learning with a reward balancing:

    • similarity to full model (KL divergence)
    • compute savings
    • budget adherence

• Budget-Aware Inference

Users provide an energy_budget in [0,1]. The controller adjusts its behavior per token to meet the budget target while maintaining model output quality.


• Visual Execution Maps

ANEE includes tooling to visualize:

  • token-by-layer skip/process patterns
  • per-token compute usage
  • overall savings
  • effective depth profiles

These “execution heatmaps” help interpret which layers the model relies on.


• Model-Agnostic Design

The wrapper manually unrolls transformer layers and is structured for easy adaptation to other decoder-only architectures beyond GPT-2.


🚀 Getting Started

Install

pip install -e .

Quick Start


from anee.wrapper import ANEEWrapper
from anee.config import ANEEConfig
from transformers import GPT2TokenizerFast

tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")

config = ANEEConfig(
    model_name="gpt2",
    energy_budget=0.5,
    controller_type="learned",
    controller_path=None     # optional path to controller weights
)

model = ANEEWrapper(config)

prompt = "The future of artificial intelligence is"
output = model.generate(tokenizer, prompt, max_new_tokens=20)

print(output)

###Configuration


ANEEConfig(
    model_name="gpt2",
    energy_budget=1.0,        # [0,1], lower = more skipping
    min_layers=2,             # always executed
    exit_budget_threshold=0.1,
    controller_type="heuristic",  # or "learned"
    state_dim=6,
    controller_hidden_dim=32,
    controller_path=None
)

##🧠 How ANEE Works

For each token, ANEE:

  • Profiles the hidden states using:

    • entropy
    • max softmax probability
    • hidden-state L2 norm
    • delta-norm between layers
    • activation variance
    • remaining budget
  • Builds a state vector from these metrics.

  • Passes the state into a small MLP controller which decides:

    • "PROCESS" → run the full transformer layer
    • "SKIP" → update only the KV-cache, skip heavy compute
    • "EXIT" → end early (optional, off by default)
  • Maintains safe KV-cache alignment even when skipping.

  • Produces logits through the model’s final layernorm + LM head.


📈 Performance Snapshot (GPT-2 Small)

At moderate budgets, ANEE typically:

  • executes ~6–9 of 12 layers per token
  • achieves ~20–30% effective compute reduction
  • maintains coherent generation
  • shows consistent “sparse middle, dense edges” execution profiles

Lower budgets naturally trade off output quality.


🔬 Intended Use & Applications

ANEE provides a clean, transparent platform for research in:

  • dynamic depth / adaptive inference
  • efficient transformer execution
  • compute-aware LLM routing
  • per-token sparsity patterns
  • RL-driven execution policies

It is well-suited for experimentation, teaching, and further development.


📄 License

APACHE 2.0


Citation

If you use ANEE in your research, please cite:

Ahmed Bin Khalid. (2025). ANEE: Adaptive Neural Execution Engine. Zenodo.
DOI: https://doi.org/10.5281/zenodo.17741880

@software{anee,
  author       = {Ahmed Bin Khalid},
  title        = {ANEE: Adaptive Neural Execution Engine},
  year         = {2025},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.17741880},
  url          = {https://doi.org/10.5281/zenodo.17741880}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anee-0.4.tar.gz (21.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anee-0.4-py3-none-any.whl (19.9 kB view details)

Uploaded Python 3

File details

Details for the file anee-0.4.tar.gz.

File metadata

  • Download URL: anee-0.4.tar.gz
  • Upload date:
  • Size: 21.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for anee-0.4.tar.gz
Algorithm Hash digest
SHA256 901a729435cb7939b184feae8c6794854ba184cc46867d7b44943fa7889d4fad
MD5 c0adf23e64094fd4dd83f7470ecab20a
BLAKE2b-256 1359097aba3498b72550d05848dbe11e325c9f955b06a43c9c5d225dc732672e

See more details on using hashes here.

File details

Details for the file anee-0.4-py3-none-any.whl.

File metadata

  • Download URL: anee-0.4-py3-none-any.whl
  • Upload date:
  • Size: 19.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for anee-0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 c301c07a6ef0e1654e881a848c274cce5782e60e1a9cafea0684dfb0ae6fb61a
MD5 e84e9544affe50df3542ab99e4ab234c
BLAKE2b-256 bc58e169a41749875cc796b4d57859b23eb4f7106d688bde06e9e66ad713db58

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page