Skip to main content

AstronomIcal Omnimodal Network - Polymathic's Large Omnimodal Model for Astronomy

Project description

🌌 AION-1: AstronomIcal Omnimodal Network

License: MIT Python 3.10+ PyTorch Tests Open In Colab

Polymathic's Large Omnimodal Model for Astronomy

🚀 Quick Start🔬 Scientific Overview📚 Documentation📦 Advanced Installation🤝 Contributing


🎯 Overview

AION Logo

AION-1 is a cutting-edge large omnimodal model specifically designed for astronomical surveys. It seamlessly integrates multiple data modalities, and enables simple adaptation to a wide range of astronomical tasks.

Alpha Testing

AION-1 model weights are hosted on Huggingface behind gates during the alpha testers phase. First, ensure that you have access to the Hugginface model weights. If you don't have access, you can request it directly on the hugginface repo here.

Once you have access, you will need to set up a huggingface token locally. This can be done by first installing hugginface_hub:

pip install huggingface_hub

and then logging in via

huggingface-cli login --token YOUR_HF_TOKEN

All of the ensuing steps should work out of the box after this point.

🚀 Quick Start

Assuming you have PyTorch installed, you can install AION trivially with:

pip install polymathic-aion

Then you can load the pretrained model and start analyzing astronomical data:

import torch
from aion import AION
from aion.codecs import CodecManager
from aion.modalities import LegacySurveyImage

# Load model and codec manager
model = AION.from_pretrained('aion-base').to('cuda')  # or 'aion-large', 'aion-xlarge'
codec_manager = CodecManager(device='cuda')

# Prepare your astronomical data (example: Legacy Survey image)
image = LegacySurveyImage(
    flux=your_image_tensor,  # Shape: [batch, 4, height, width] for g,r,i,z bands
    bands=['DES-G', 'DES-R', 'DES-I', 'DES-Z']
)

# Encode data to tokens
tokens = codec_manager.encode(image)

# Option 1: Extract embeddings for downstream tasks
embeddings = model.encode(tokens, num_encoder_tokens=600)

# Option 2: Generate predictions (e.g., redshift)
from aion.modalities import Z
predictions = model(
    tokens,
    target_mask={'tok_z': torch.zeros(batch_size, 1)},
    num_encoder_tokens=600
)

🔬 Scientific Overview

🧬 Architecture

AION-1 employs a two-stage, transformer-based design:

  1. Modality-Specific Tokenizers transform raw inputs into discrete tokens
  2. Unified Encoder–Decoder Transformer ingests all token streams via a multimodal masked modeling (4M) objective

🗂️ Supported Modalities

AION-1’s tokenizers cover 39 distinct data types, grouped by survey and data category

Category Description Token Name(s)
Imaging (2) Legacy Survey, HSC Wide tok_image_ls, tok_image_hsc
Catalog (1) Legacy Survey catalog entries catalog
Spectra (2) SDSS, DESI tok_spectrum_sdss, tok_spectrum_desi
Gaia (4) BP/RP spectra, parallax, sky coords tok_xp_bp, tok_xp_rp, tok_parallax, tok_ra, tok_dec
Gaia Photometry (3) G/BP/RP flux tok_flux_g_gaia, tok_flux_bp_gaia, tok_flux_rp_gaia
Legacy Survey (9) g,r,i,z bands & WISE W1–W4 flux, E(B–V) tok_flux_g,…,tok_flux_w4, tok_ebv
Legacy Shape (3) Ellipticity components & effective radius tok_shape_e1, tok_shape_e2, tok_shape_r
HSC Photometry (5) g,r,i,z,y magnitudes tok_mag_g,…,tok_mag_y
HSC Extinction (5) g,r,i,z,y extinctions tok_a_g,…,tok_a_y
HSC Shape (3) Shape components 11,22,12 tok_shape11, tok_shape22, tok_shape12
Other (1) Spectroscopic redshift tok_z

📈 Model Variants

Variant Encoder Blocks Decoder Blocks Model Dim Heads Total Params
Base 12 12 768 12 300 M
Large 24 24 1024 16 800 M
XLarge 24 24 2048 32 3 B

Pretraining – Global batch size: 8 192 – Steps: Base (1.5 days on 64 H100), Large (2.5 days on 100 H100), XLarge (3.5 days on 288 H100) – Optimizer: AdamW, peak LR 2 × 10⁻⁴, linear warmup + cosine decay

🔧 Data Preparation

AION uses a typed data system to understand the provenance of each astronomical observation. Each modality must be properly formatted:

Modality Types

from aion.modalities import (
    LegacySurveyImage, HSCImage,           # Images
    DESISpectrum, SDSSSpectrum,            # Spectra
    LegacySurveyFluxG, HSCMagG,            # Photometry
    GaiaParallax, Z,                       # Scalars
    # ... and 30+ more modalities
)

Example: Preparing Legacy Survey Data

import torch
from aion.modalities import LegacySurveyImage, LegacySurveyFluxG

# Format image data (shape: [batch, 4, height, width])
image = LegacySurveyImage(
    flux=torch.tensor(image_data, dtype=torch.float32),
    bands=['DES-G', 'DES-R', 'DES-I', 'DES-Z']
)

# Format scalar photometry
flux_g = LegacySurveyFluxG(value=torch.tensor([flux_values]))

Supported Data Formats

Survey Modality Required Format
Legacy Survey Images 4-band (g,r,i,z), any resolution (auto-cropped to 96×96)
HSC Images 5-band (g,r,i,z,y), any resolution
DESI/SDSS Spectra Flux, inverse variance, wavelength arrays
Gaia BP/RP Coefficient arrays (55 coefficients each)
All Surveys Scalars Single values or 1D tensors

💡 Example Use Cases

🔍 Similarity Search

Find galaxies similar to a query object across different modalities:

# Extract embeddings for similarity search
query_embedding = model.encode(codec_manager.encode(query_image))
all_embeddings = model.encode(codec_manager.encode(*dataset_images))

# Find most similar objects using cosine similarity
from sklearn.metrics.pairwise import cosine_similarity
similarity_scores = cosine_similarity(query_embedding, all_embeddings)
similar_objects = similarity_scores.argsort()[::-1][:10]  # Top 10 similar

📊 Property Prediction

Build lightweight models on AION embeddings:

# Extract embeddings from multiple modalities
embeddings = model.encode(codec_manager.encode(
    image, spectrum, flux_g, flux_r, flux_i, flux_z
), num_encoder_tokens=900)

# Train simple regressor for stellar mass, redshift, etc.
from sklearn.neighbors import KNeighborsRegressor
regressor = KNeighborsRegressor(n_neighbors=5)
regressor.fit(embeddings.mean(axis=1), target_property)

🌌 Generative Modeling

Predict missing astronomical properties:

# Predict redshift from photometry + morphology
predictions = model(
    codec_manager.encode(image, flux_g, flux_r, flux_i, flux_z),
    target_mask={'tok_z': torch.zeros(batch_size, 1)},
    num_encoder_tokens=600
)
redshift_probs = torch.softmax(predictions['tok_z'], dim=-1)

📚 Documentation

🎓 Tutorials

Start with our interactive tutorial:

For detailed guides, see the online documentation.

📦 Advanced Installation

AION offers flexible installation options to suit your environment and requirements.

To install AION with PyTorch included:

pip install aion[torch]

For contributors and developers:

pip install aion[torch,dev]

This includes testing frameworks, linting tools, and development dependencies.

For specific PyTorch versions (e.g., CUDA support):

# Install PyTorch with CUDA 12.4 support
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu124

# Then install AION
pip install aion

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🌟 Acknowledgments

AION is developed by Polymathic AI, advancing the frontier of AI for scientific applications.

📬 Contact


Built with ❤️ for the astronomical community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polymathic_aion-0.0.2.tar.gz (4.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polymathic_aion-0.0.2-py3-none-any.whl (89.0 kB view details)

Uploaded Python 3

File details

Details for the file polymathic_aion-0.0.2.tar.gz.

File metadata

  • Download URL: polymathic_aion-0.0.2.tar.gz
  • Upload date:
  • Size: 4.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for polymathic_aion-0.0.2.tar.gz
Algorithm Hash digest
SHA256 3314b5e159c5fc7295a428f186f7a7961158f769fe95196725893d1188451912
MD5 689f660ef4000debf1efe81dc57def2c
BLAKE2b-256 4be9689e30c5e9d5b603e270c3d63f2241975d974de9484ec677d19392dc0c2a

See more details on using hashes here.

Provenance

The following attestation bundles were made for polymathic_aion-0.0.2.tar.gz:

Publisher: publish-pypi.yml on PolymathicAI/AION

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polymathic_aion-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for polymathic_aion-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d6d5dbcf58244516df981cd530a1e07e02bab5c48be30209c4b3272045b58bbd
MD5 f7599ca7047258c505899cc846aacca6
BLAKE2b-256 c2ff8606b7a33e700bc47e27ffdc9cf13b920f469aeeefd9b3a8076eb3c5ad41

See more details on using hashes here.

Provenance

The following attestation bundles were made for polymathic_aion-0.0.2-py3-none-any.whl:

Publisher: publish-pypi.yml on PolymathicAI/AION

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page