An objective way to evaluate neural network architectures
Project description
AI-HEXAGON
โ ๏ธ Early Development: This project is currently in its early development phase and not accepting external architecture submissions yet. Star/watch the repository to be notified when we open for contributions.
๐ View Live Leaderboard & Results
AI-HEXAGON is an objective benchmarking framework designed to evaluate neural network architectures independently of natural language processing tasks. By isolating architectural capabilities from training techniques and datasets, it enables meaningful and efficient comparisons between different neural network designs.
๐ฏ Motivation
Traditional neural network benchmarking often conflates architectural performance with training techniques and dataset biases. This makes it challenging to:
- Isolate true architectural capabilities
- Iterate quickly on design changes
- Compare models fairly
AI-HEXAGON solves these challenges by:
- ๐ Pure Architecture Focus: Tests that evaluate only the architecture, removing confounding factors like tokenization and dataset-specific optimizations
- โก Rapid Iteration: Enable quick testing of architectural changes without large-scale training
- ๐ ๏ธ Flexible Testing: Support both standard benchmarking and custom test suites
๐ Key Features
- ๐ Pure Architecture Evaluation: Tests fundamental capabilities independently
- โ๏ธ Controlled Environment: Fixed parameter budget and raw numerical inputs
- ๐ Clear Metrics: Six independently measured fundamental capabilities
- ๐ Transparent Implementation: Clean, framework-agnostic code
- ๐ค Automated Testing: GitHub Actions for fair, manipulation-proof evaluation
- ๐ Live Results: Real-time benchmarking results at ai-hexagon.dev
๐ Metrics (The Hexagon)
Each architecture is evaluated on six fundamental capabilities:
Metric | Description |
---|---|
๐ง Memory Capacity | Store and recall information from training data |
๐ State Management | Maintain and manipulate internal hidden states |
๐ฏ Pattern Recognition | Recognize and extrapolate sequences |
๐ Position Processing | Handle positional information within sequences |
๐ Long-Range Dependency | Manage dependencies over long sequences |
๐ Length Generalization | Process sequences longer than training examples |
๐ Project Structure
ai-hexagon/
โโโ ai_hexagon/
โ โโโ modules/ # Common neural network modules
โโโ results/ # Model implementations and results
โโโ suite.json # Default test suite configuration
โโโ transformer/
โโโ model.py # Transformer implementation
โโโ modules/ # Custom modules (if needed)
โ๏ธ Parameter Budget
The default suite enforces a 4MB parameter limit for fair comparisons:
Precision | Parameter Limit |
---|---|
Complex64 | 0.5M params |
Float32 | 1M params |
Float16 | 2M params |
Int8 | 4M params |
๐ค Contributing
We welcome contributions once the project is ready for external input. To contribute:
- Fork: Create your own fork of the project
- Install: Run
poetry install
(optionally with--with dev,cuda12
) to get theai-hex
command - Implement: Add your model in
results/your_model_name/
- Document: Include comprehensive docstrings and references
- Submit: Create a pull request following our guidelines
- Wait: CI will automatically evaluate your model and update the leaderboard
Use ai-hex tests list
to see available tests, ai-hex tests show test_name
to view test schema, and ai-hex suite run ./path/to/model.py
to run your model against the suite.
๐ง Technical Stack: JAX and Flax
We chose JAX and Flax for their:
- ๐งฉ Functional Design: Clear architecture definitions with immutable state
- โก Custom Operations: Comprehensive support through
jax.numpy
- ๐ฏ Reproducibility: First-class random number handling
๐ Code Style: Using einops
We mandate einops
for complex tensor operations to enhance readability. Compare:
# Traditional approach - hard to understand the transformation
x = x.reshape(batch, x.shape[1], x.shape[-2]*2, x.shape[-1]//2)
x = x.transpose(0, 2, 1, 3)
# Using einops - crystal clear intent
x = rearrange(x, 'b t (h d) c -> b (h t) (d c)')
๐ Example Model Implementation
import flax.linen as nn
from einops import rearrange
class Transformer(nn.Module):
"""
Transformer Decoder Stack architecture from 'Attention Is All You Need'.
Reference: https://arxiv.org/abs/1706.03762
"""
hidden_dim: int = 256
num_layers: int = 4
num_heads: int = 4
@nn.compact
def __call__(self, x):
# Architecture implementation
return x
๐ Test Suite Configuration
Test suites use a JSON configuration format:
{
"name": "General 1M",
"description": "General architecture performance evaluation",
"metrics": [
{
"name": "Memory Capacity",
"description": "Information storage and recall capability",
"tests": [
{
"weight": 1.0,
"test": {
"name": "hash_map",
"seed": 0,
"key_length": 8,
"value_length": 64,
"num_pairs_range": [32, 65536],
"vocab_size": 1024
}
}
]
}
]
}
๐ Results are automatically generated via GitHub Actions to ensure fairness. The leaderboard is updated in real-time at ai-hexagon.dev.
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ai_hexagon-0.1.0.tar.gz
.
File metadata
- Download URL: ai_hexagon-0.1.0.tar.gz
- Upload date:
- Size: 11.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.12.7 Linux/6.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 929037d4a76131f2a726310f10b29b92dc469cd4e7141dab74412f96a0a24608 |
|
MD5 | b4fd732f8b96afe75143ce3913b1f4dd |
|
BLAKE2b-256 | 46d68f6e396b5fadec514ff9f547f19f506d7ec8e2b47c5378d97ffe0ede268f |
File details
Details for the file ai_hexagon-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: ai_hexagon-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.12.7 Linux/6.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2b4034297565e7bd4afa527bccd9100f9ab95e8b4a4cbaf9a84332d4f2a64fc2 |
|
MD5 | af92e6e08789a5ef12c5f2af7cdbbe1f |
|
BLAKE2b-256 | 33ae98e44ad386dc17cb32280cccf4eef507e09838cbb82c52d83db75b97a52e |