AstronomIcal Omnimodal Network - Polymathic's Large Omnimodal Model for Astronomy
Project description
🌌 AION-1: AstronomIcal Omnimodal Network
Polymathic's Large Omnimodal Model for Astronomy
🚀 Quick Start • 🔬 Scientific Overview • 📚 Documentation • 📦 Advanced Installation • 🤝 Contributing
🎯 Overview
AION-1 is a cutting-edge large omnimodal model specifically designed for astronomical surveys. It seamlessly integrates multiple data modalities, and enables simple adaptation to a wide range of astronomical tasks.
Alpha Testing
AION-1 model weights are hosted on Huggingface behind gates during the alpha testers phase. First, ensure that you have access to the Hugginface model weights. If you don't have access, you can request it directly on the hugginface repo here.
Once you have access, you will need to set up a huggingface token locally. This can be done by first installing hugginface_hub:
pip install huggingface_hub
and then logging in via
huggingface-cli login --token YOUR_HF_TOKEN
All of the ensuing steps should work out of the box after this point.
🚀 Quick Start
Assuming you have PyTorch installed, you can install AION trivially with:
pip install polymathic-aion
Then you can load the pretrained model and start analyzing astronomical data:
import torch
from aion import AION
from aion.codecs import CodecManager
from aion.modalities import LegacySurveyImage
# Load model and codec manager
model = AION.from_pretrained('aion-base').to('cuda') # or 'aion-large', 'aion-xlarge'
codec_manager = CodecManager(device='cuda')
# Prepare your astronomical data (example: Legacy Survey image)
image = LegacySurveyImage(
flux=your_image_tensor, # Shape: [batch, 4, height, width] for g,r,i,z bands
bands=['DES-G', 'DES-R', 'DES-I', 'DES-Z']
)
# Encode data to tokens
tokens = codec_manager.encode(image)
# Option 1: Extract embeddings for downstream tasks
embeddings = model.encode(tokens, num_encoder_tokens=600)
# Option 2: Generate predictions (e.g., redshift)
from aion.modalities import Z
predictions = model(
tokens,
target_mask={'tok_z': torch.zeros(batch_size, 1)},
num_encoder_tokens=600
)
🔬 Scientific Overview
🧬 Architecture
AION-1 employs a two-stage, transformer-based design:
- Modality-Specific Tokenizers transform raw inputs into discrete tokens
- Unified Encoder–Decoder Transformer ingests all token streams via a multimodal masked modeling (4M) objective
🗂️ Supported Modalities
AION-1’s tokenizers cover 39 distinct data types, grouped by survey and data category
| Category | Description | Token Name(s) |
|---|---|---|
| Imaging (2) | Legacy Survey, HSC Wide | tok_image_ls, tok_image_hsc |
| Catalog (1) | Legacy Survey catalog entries | catalog |
| Spectra (2) | SDSS, DESI | tok_spectrum_sdss, tok_spectrum_desi |
| Gaia (4) | BP/RP spectra, parallax, sky coords | tok_xp_bp, tok_xp_rp, tok_parallax, tok_ra, tok_dec |
| Gaia Photometry (3) | G/BP/RP flux | tok_flux_g_gaia, tok_flux_bp_gaia, tok_flux_rp_gaia |
| Legacy Survey (9) | g,r,i,z bands & WISE W1–W4 flux, E(B–V) | tok_flux_g,…,tok_flux_w4, tok_ebv |
| Legacy Shape (3) | Ellipticity components & effective radius | tok_shape_e1, tok_shape_e2, tok_shape_r |
| HSC Photometry (5) | g,r,i,z,y magnitudes | tok_mag_g,…,tok_mag_y |
| HSC Extinction (5) | g,r,i,z,y extinctions | tok_a_g,…,tok_a_y |
| HSC Shape (3) | Shape components 11,22,12 | tok_shape11, tok_shape22, tok_shape12 |
| Other (1) | Spectroscopic redshift | tok_z |
📈 Model Variants
| Variant | Encoder Blocks | Decoder Blocks | Model Dim | Heads | Total Params |
|---|---|---|---|---|---|
| Base | 12 | 12 | 768 | 12 | 300 M |
| Large | 24 | 24 | 1024 | 16 | 800 M |
| XLarge | 24 | 24 | 2048 | 32 | 3 B |
Pretraining – Global batch size: 8 192 – Steps: Base (1.5 days on 64 H100), Large (2.5 days on 100 H100), XLarge (3.5 days on 288 H100) – Optimizer: AdamW, peak LR 2 × 10⁻⁴, linear warmup + cosine decay
🔧 Data Preparation
AION uses a typed data system to understand the provenance of each astronomical observation. Each modality must be properly formatted:
Modality Types
from aion.modalities import (
LegacySurveyImage, HSCImage, # Images
DESISpectrum, SDSSSpectrum, # Spectra
LegacySurveyFluxG, HSCMagG, # Photometry
GaiaParallax, Z, # Scalars
# ... and 30+ more modalities
)
Example: Preparing Legacy Survey Data
import torch
from aion.modalities import LegacySurveyImage, LegacySurveyFluxG
# Format image data (shape: [batch, 4, height, width])
image = LegacySurveyImage(
flux=torch.tensor(image_data, dtype=torch.float32),
bands=['DES-G', 'DES-R', 'DES-I', 'DES-Z']
)
# Format scalar photometry
flux_g = LegacySurveyFluxG(value=torch.tensor([flux_values]))
Supported Data Formats
| Survey | Modality | Required Format |
|---|---|---|
| Legacy Survey | Images | 4-band (g,r,i,z), any resolution (auto-cropped to 96×96) |
| HSC | Images | 5-band (g,r,i,z,y), any resolution |
| DESI/SDSS | Spectra | Flux, inverse variance, wavelength arrays |
| Gaia | BP/RP | Coefficient arrays (55 coefficients each) |
| All Surveys | Scalars | Single values or 1D tensors |
💡 Example Use Cases
🔍 Similarity Search
Find galaxies similar to a query object across different modalities:
# Extract embeddings for similarity search
query_embedding = model.encode(codec_manager.encode(query_image))
all_embeddings = model.encode(codec_manager.encode(*dataset_images))
# Find most similar objects using cosine similarity
from sklearn.metrics.pairwise import cosine_similarity
similarity_scores = cosine_similarity(query_embedding, all_embeddings)
similar_objects = similarity_scores.argsort()[::-1][:10] # Top 10 similar
📊 Property Prediction
Build lightweight models on AION embeddings:
# Extract embeddings from multiple modalities
embeddings = model.encode(codec_manager.encode(
image, spectrum, flux_g, flux_r, flux_i, flux_z
), num_encoder_tokens=900)
# Train simple regressor for stellar mass, redshift, etc.
from sklearn.neighbors import KNeighborsRegressor
regressor = KNeighborsRegressor(n_neighbors=5)
regressor.fit(embeddings.mean(axis=1), target_property)
🌌 Generative Modeling
Predict missing astronomical properties:
# Predict redshift from photometry + morphology
predictions = model(
codec_manager.encode(image, flux_g, flux_r, flux_i, flux_z),
target_mask={'tok_z': torch.zeros(batch_size, 1)},
num_encoder_tokens=600
)
redshift_probs = torch.softmax(predictions['tok_z'], dim=-1)
📚 Documentation
🎓 Tutorials
Start with our interactive tutorial:
- Open in Google Colab - Learn AION basics interactively, no local setup required!
For detailed guides, see the online documentation.
📦 Advanced Installation
AION offers flexible installation options to suit your environment and requirements.
To install AION with PyTorch included:
pip install aion[torch]
For contributors and developers:
pip install aion[torch,dev]
This includes testing frameworks, linting tools, and development dependencies.
For specific PyTorch versions (e.g., CUDA support):
# Install PyTorch with CUDA 12.4 support
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu124
# Then install AION
pip install aion
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🌟 Acknowledgments
AION is developed by Polymathic AI, advancing the frontier of AI for scientific applications.
📬 Contact
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file polymathic_aion-0.0.2.tar.gz.
File metadata
- Download URL: polymathic_aion-0.0.2.tar.gz
- Upload date:
- Size: 4.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3314b5e159c5fc7295a428f186f7a7961158f769fe95196725893d1188451912
|
|
| MD5 |
689f660ef4000debf1efe81dc57def2c
|
|
| BLAKE2b-256 |
4be9689e30c5e9d5b603e270c3d63f2241975d974de9484ec677d19392dc0c2a
|
Provenance
The following attestation bundles were made for polymathic_aion-0.0.2.tar.gz:
Publisher:
publish-pypi.yml on PolymathicAI/AION
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polymathic_aion-0.0.2.tar.gz -
Subject digest:
3314b5e159c5fc7295a428f186f7a7961158f769fe95196725893d1188451912 - Sigstore transparency entry: 253717521
- Sigstore integration time:
-
Permalink:
PolymathicAI/AION@ed35b4b85c8e6b7e8ebd1473d46a2446e05ddd75 -
Branch / Tag:
refs/tags/v0.0.2 - Owner: https://github.com/PolymathicAI
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@ed35b4b85c8e6b7e8ebd1473d46a2446e05ddd75 -
Trigger Event:
push
-
Statement type:
File details
Details for the file polymathic_aion-0.0.2-py3-none-any.whl.
File metadata
- Download URL: polymathic_aion-0.0.2-py3-none-any.whl
- Upload date:
- Size: 89.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d6d5dbcf58244516df981cd530a1e07e02bab5c48be30209c4b3272045b58bbd
|
|
| MD5 |
f7599ca7047258c505899cc846aacca6
|
|
| BLAKE2b-256 |
c2ff8606b7a33e700bc47e27ffdc9cf13b920f469aeeefd9b3a8076eb3c5ad41
|
Provenance
The following attestation bundles were made for polymathic_aion-0.0.2-py3-none-any.whl:
Publisher:
publish-pypi.yml on PolymathicAI/AION
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polymathic_aion-0.0.2-py3-none-any.whl -
Subject digest:
d6d5dbcf58244516df981cd530a1e07e02bab5c48be30209c4b3272045b58bbd - Sigstore transparency entry: 253717529
- Sigstore integration time:
-
Permalink:
PolymathicAI/AION@ed35b4b85c8e6b7e8ebd1473d46a2446e05ddd75 -
Branch / Tag:
refs/tags/v0.0.2 - Owner: https://github.com/PolymathicAI
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@ed35b4b85c8e6b7e8ebd1473d46a2446e05ddd75 -
Trigger Event:
push
-
Statement type: