Adaptive Input/Output Normalization for deep neural networks. Enables stable training of extremely deep networks through adaptive residual scaling (Alpha) by Babayev's Theory

These details have not been verified by PyPI

Project links

Project description

AION-Torch

[WARNING] Alpha Version: This library is currently in alpha. APIs may change without notice. Use at your own risk.

Adaptive Input/Output Normalization for deep neural networks. AION dynamically adjusts residual connection scaling for stable training of extremely deep networks.

What is AION?

AION (Adaptive Input/Output Normalization) is an adaptive residual scaling layer that keeps the energy of residual branches in balance. Instead of using a fixed scale for x + y, AION dynamically adjusts α in x + α·y based on the input and output energies. This stabilizes very deep networks (hundreds of layers) and improves convergence without manual tuning.

The Proof

Crash Test Results (600-layer Transformer, GPU)

AION demonstrates superior numerical stability and faster convergence.

AION vs Standard Transformer Crash Test

600-layer transformer test on GPU: Both models completed all 150 training steps successfully. AION Transformer achieved significantly lower loss (0.0011 ± 0.0003) and more stable gradients compared to Standard Transformer (0.0075 ± 0.0015).

Benchmark Methodology:

Both models use Pre-LayerNorm architecture (normalization before the feedforward network), which is the standard practice in modern transformers (GPT, BERT, etc.). Pre-LayerNorm enables standard transformers to work at deep depths by normalizing activations before transformation, helping maintain stable gradient flow. We tested 600 layers to demonstrate AION's advantages at extreme depth while ensuring both models complete the full training run without memory constraints. This makes the comparison fair—both models use the same modern best practices, and AION still demonstrates superior stability and convergence speed even at these extreme depths.

Key Findings:

Standard Transformer: Completed all 150 steps, final loss: 0.0075 ± 0.0015, crash rate: 0%
AION Transformer: Completed all 150 steps, final loss: 0.0011 ± 0.0003, crash rate: 0%
Gradient Stability: AION maintained more stable and lower gradient norms (0.0135 ± 0.0033) vs Standard (0.0665 ± 0.0116)
Training Efficiency: AION achieved ~7x lower final loss, demonstrating significantly faster convergence

These results suggest that AION can improve numerical stability and convergence speed at extreme depths (600 layers), even on top of modern Pre-LayerNorm architectures.

Installation

Install from PyPI:

pip install aion-torch

Or install in development mode with dev dependencies:

pip install -e ".[dev]"

Quick Start

import torch
from aion_torch import AionResidual

# Create AION layer
layer = AionResidual(alpha0=0.1, beta=0.05)

# Use in residual connection
x = torch.randn(8, 128, 512)  # [batch, seq, features]
y = torch.randn(8, 128, 512)  # Output from FFN/attention
out = layer(x, y)             # Adaptive residual: x + α·y

Overhead Benchmark Results (GPU)

AION adds ~36% computational overhead per training step.

Overhead Benchmark Results

Benchmark configuration: 4-layer transformer, batch size 8, sequence length 128, dimension 512. Results averaged over 150 training steps (after 20 warmup steps).

Performance Metrics (Unoptimized Baseline):

Standard Residual: 9.79 ms/step (102.11 steps/sec)
AION Residual: 13.36 ms/step (74.84 steps/sec)
Overhead: +36.44% per training step

The overhead comes from AION's adaptive scaling calculations, which provide the stability benefits shown in the crash test.

There are several ways to reduce this cost in practice:

Gradient accumulation: accumulate gradients over multiple batches to amortize the per-batch overhead.
Engineering optimizations: fusing operations, reusing statistics, or using lower precision for energy tracking. With careful optimization, we expect the overhead to be reduced to below ~5% in production setups.

Note: Alpha updates every forward pass in training mode to ensure correct behavior in distributed training (DataParallel/DDP).

Features

Adaptive scaling: Automatically adjusts to network dynamics
Training stability: Prevents gradient explosion and vanishing
Deep network support: Works with networks of any depth
Faster convergence: Achieves lower loss faster than standard residuals
PyTorch 2.0+: Fully compatible with modern PyTorch

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Format code
make format

# Run linting
make lint

# Run tests
make test

# Install pre-commit hooks
make pre-commit-install

License

MIT License - see LICENSE file for details.

Note: This is an Alpha version. APIs may change without notice. Use at your own risk.

Author

Abbasagha Babayev

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.2

Mar 5, 2026

1.0.1

Mar 5, 2026

1.0.0

Jan 28, 2026

0.3.3

Nov 21, 2025

0.3.2

Nov 20, 2025

0.3.1

Nov 20, 2025

0.3.0

Nov 19, 2025

This version

0.2.0

Nov 19, 2025

0.1.0

Nov 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aion_torch-0.2.0.tar.gz (16.8 kB view details)

Uploaded Nov 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aion_torch-0.2.0-py3-none-any.whl (9.6 kB view details)

Uploaded Nov 19, 2025 Python 3

File details

Details for the file aion_torch-0.2.0.tar.gz.

File metadata

Download URL: aion_torch-0.2.0.tar.gz
Upload date: Nov 19, 2025
Size: 16.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for aion_torch-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`fdd71200186b5d7c0d37d804924414f36d997f74e7e3e90528e284fafe97bfcb`
MD5	`0c1fd63ada6e4d27c782746c867a1553`
BLAKE2b-256	`bb79e6ac84ebae684af8b6e4ec0ea9685ff4f9787faf5891ce5aef9a68d37ca8`

See more details on using hashes here.

File details

Details for the file aion_torch-0.2.0-py3-none-any.whl.

File metadata

Download URL: aion_torch-0.2.0-py3-none-any.whl
Upload date: Nov 19, 2025
Size: 9.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for aion_torch-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c3ae3aeeb8818068a19988be4fdfbe2b792cf382514a68b2f7cb0f2fece701b9`
MD5	`3310596881ab7e802c0e76adca3095a0`
BLAKE2b-256	`2cba4c481695c9c35c29da3beec6482fc0db5f490f443850a63774cd82d10252`

See more details on using hashes here.

aion-torch 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AION-Torch

What is AION?

The Proof

Crash Test Results (600-layer Transformer, GPU)

Installation

Quick Start

Overhead Benchmark Results (GPU)

Features

Development

License

Author

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes