Skip to main content

Adaptive Input/Output Normalization for deep neural networks. Enables stable training of extremely deep networks through adaptive residual scaling (Alpha) by Babayev's Theory

Project description

AION-Torch

[WARNING] Alpha Version: This library is currently in alpha. APIs may change without notice. Use at your own risk.

Adaptive Input/Output Normalization for deep neural networks. AION dynamically adjusts residual connection scaling for stable training of extremely deep networks.

What is AION?

AION (Adaptive Input/Output Normalization) is an adaptive residual scaling layer that keeps the energy of residual branches in balance. Instead of using a fixed scale for x + y, AION dynamically adjusts α in x + α·y based on the input and output energies. This stabilizes very deep networks (hundreds of layers) and improves convergence without manual tuning.

The Proof

Crash Test Results (600-layer Transformer, GPU)

AION demonstrates superior numerical stability and faster convergence.

AION vs Standard Transformer Crash Test

600-layer transformer test on GPU: Both models completed all 150 training steps successfully. AION Transformer achieved significantly lower loss (0.0011 ± 0.0003) and more stable gradients compared to Standard Transformer (0.0075 ± 0.0015).

Benchmark Methodology:

Both models use Pre-LayerNorm architecture (normalization before the feedforward network), which is the standard practice in modern transformers (GPT, BERT, etc.). Pre-LayerNorm enables standard transformers to work at deep depths by normalizing activations before transformation, helping maintain stable gradient flow. We tested 600 layers to demonstrate AION's advantages at extreme depth while ensuring both models complete the full training run without memory constraints. This makes the comparison fair—both models use the same modern best practices, and AION still demonstrates superior stability and convergence speed even at these extreme depths.

Key Findings:

  • Standard Transformer: Completed all 150 steps, final loss: 0.0075 ± 0.0015, crash rate: 0%
  • AION Transformer: Completed all 150 steps, final loss: 0.0011 ± 0.0003, crash rate: 0%
  • Gradient Stability: AION maintained more stable and lower gradient norms (0.0135 ± 0.0033) vs Standard (0.0665 ± 0.0116)
  • Training Efficiency: AION achieved ~7x lower final loss, demonstrating significantly faster convergence

These results suggest that AION can improve numerical stability and convergence speed at extreme depths (600 layers), even on top of modern Pre-LayerNorm architectures.

Installation

Install from PyPI:

pip install aion-torch

Or install in development mode with dev dependencies:

pip install -e ".[dev]"

Quick Start

import torch
from aion_torch import AionResidual

# Create AION layer
layer = AionResidual(alpha0=0.1, beta=0.05)

# Use in residual connection
x = torch.randn(8, 128, 512)  # [batch, seq, features]
y = torch.randn(8, 128, 512)  # Output from FFN/attention
out = layer(x, y)             # Adaptive residual: x + α·y

Overhead Benchmark Results (GPU)

AION adds ~36% computational overhead per training step.

Overhead Benchmark Results

Benchmark configuration: 4-layer transformer, batch size 8, sequence length 128, dimension 512. Results averaged over 150 training steps (after 20 warmup steps).

Performance Metrics (Unoptimized Baseline):

  • Standard Residual: 9.79 ms/step (102.11 steps/sec)
  • AION Residual: 13.36 ms/step (74.84 steps/sec)
  • Overhead: +36.44% per training step

The overhead comes from AION's adaptive scaling calculations, which provide the stability benefits shown in the crash test.

There are several ways to reduce this cost in practice:

  • k-update optimization: use k_update > 1 to update α less frequently (e.g. k_update=4 reduces the AION-specific computation by ~75%).
  • Engineering optimizations: fusing operations, reusing statistics, or using lower precision for energy tracking. With careful optimization, we expect the overhead to be reduced to below ~5% in production setups.

Features

  • Adaptive scaling: Automatically adjusts to network dynamics
  • Training stability: Prevents gradient explosion and vanishing
  • Deep network support: Works with networks of any depth
  • Faster convergence: Achieves lower loss faster than standard residuals
  • PyTorch 2.0+: Fully compatible with modern PyTorch

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Format code
make format

# Run linting
make lint

# Run tests
make test

# Install pre-commit hooks
make pre-commit-install

License

MIT License - see LICENSE file for details.

Note: This is an Alpha version. APIs may change without notice. Use at your own risk.

Author

Abbasagha Babayev

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aion_torch-0.1.0.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aion_torch-0.1.0-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file aion_torch-0.1.0.tar.gz.

File metadata

  • Download URL: aion_torch-0.1.0.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for aion_torch-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1f83c6a235c36334bee2a6a0a65fbaf24bc959105dbe939fc792f61619050123
MD5 e11ff8cc64f8141d7a631b03c4827d29
BLAKE2b-256 13b9817377dc1132410340b09e36985342b694796dd6fd27c93ac95bf5722924

See more details on using hashes here.

File details

Details for the file aion_torch-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: aion_torch-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for aion_torch-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1d1eacd639ee5d642696ee64a90c4ce86e5c629dc18ef4325c1a4b0332ada0bc
MD5 d18a0d0c349954f33eabf69547289f41
BLAKE2b-256 429da64f210d62c3b1270d073665099cbeb225eeea4e2c58d71f4c2f6f7e5b79

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page