Skip to main content

An ALMA Simulation package for a more civilized era.

Project description

ALMASim

PyPI version Python 3.12 Documentation CI codecov License: GPL v3

ALMASim is a library-first Python environment for simulating ALMA observations, exploring ALMA metadata, downloading science products, and building ML-ready radio/mm-wave datasets.

It provides reusable services in src/almasim that can be driven by CLI scripts, Jupyter notebooks, a FastAPI backend, or direct Python code — all through the same staged API.


Table of Contents


Key Capabilities

Simulation

  • Build clean sky cubes from point, Gaussian, extended, molecular-cloud, diffuse, Galaxy Zoo, and Hubble-100 source models
  • Simulate single-pointing ALMA interferometric observations with multi-configuration support (12m, 7m, TP)
  • PWV-aware per-channel noise model
  • Additive astrophysical background sky — faint dusty galaxies, diffuse emission, or combined
  • Optional serendipitous source injection
  • Iterative CLEAN-style deconvolution with resumable state
  • TP+INT feather-style image combination

Data Products

  • Dirty cube, dirty visibilities, beam cube, UV mask cube, U/V coordinate cubes
  • Interferometric, total-power, and combined TP+INT image cubes
  • ML-ready HDF5 shards (clean cube + dirty cube + dirty visibilities + UV mask + metadata)
  • Native MeasurementSet (.ms) export via CASA tools or python-casacore

Metadata and Archive

  • Query ALMA observations via TAP with rich inclusion/exclusion filters
  • Normalise TAP columns into stable application fields
  • Resolve DataLink products, download ALMA data products with parallel support
  • Unpack raw ASDMs into MeasurementSets
  • Apply delivered calibration to produce calibrated science MSs

Compute

  • Synchronous, local multiprocess, Dask, Slurm, and Kubernetes backends
  • Backend-agnostic simulation service layer

Architecture

src/almasim/          ← installable library  (pip install almasim)
  services/
    simulation.py     ← staged pipeline entry points
    interferometry/   ← UV sampling, baselines, noise, TP
    imaging/          ← deconvolution, TP+INT combination
    metadata/         ← TAP queries, normalisation
    products/         ← MS export, HDF5 shards, cube export
    compute/          ← backend abstraction
    archive/          ← ASDM unpack, calibration apply
    astro/            ← spectral lines, redshift, parameters
  skymodels/          ← source model implementations

backend/              ← FastAPI service  (Docker: ghcr.io/…/almasim-backend)
frontend/             ← Svelte UI  (requires Docker Compose)
examples/             ← CLI scripts and Jupyter notebooks

The library layer owns all domain logic. The backend is a thin adapter over library services. CLI scripts and notebooks call the same staged services directly.


Installation

Library only (cross-platform)

pip install almasim

With CASA tools (Linux x86-64 only)

casatools and casatasks wheels are Linux-only. Install the optional [casa] extra on a supported Linux system:

pip install "almasim[casa]"

The [casa] extra enables:

  • Native MeasurementSet export via casatools
  • ASDM-to-MS conversion via casatasks.importasdm
  • Calibration application via casatasks.applycal

Without [casa], all simulation, imaging, metadata, and download features still work. The MS export path falls back to python-casacore if available:

pip install "almasim[ms-casacore]"

From source (development)

git clone https://github.com/MicheleDelliVeneri/ALMASim.git
cd ALMASim
pip install uv
uv sync --group dev

Backend service (Docker Compose)

The FastAPI backend and Svelte frontend require Docker Compose:

git clone https://github.com/MicheleDelliVeneri/ALMASim.git
cd ALMASim
docker compose up

The backend image is available pre-built from GHCR:

docker pull ghcr.io/michelledelliveneri/almasim-backend:latest

Quick Start

Query ALMA metadata

from almasim.services.metadata.tap.service import query_by_science_type, InclusionFilters

df = query_by_science_type(
    include=InclusionFilters(science_keyword=["Galaxies"], band=[6])
)
print(df[["ALMA_source_name", "Band", "spatial_resolution"]].head())

Run a simulation from a metadata row

from almasim import SimulationParams, run_simulation
from pathlib import Path

params = SimulationParams.from_metadata_row(
    row,                          # pandas Series from a metadata query
    idx=0,
    main_dir=Path("src/almasim"),
    output_dir=Path("output"),
    project_name="my_project",
)

result = run_simulation(params)

Use the staged API

from almasim import (
    SimulationParams,
    generate_clean_cube,
    simulate_observation,
    image_products,
    export_results,
)

params = SimulationParams.from_metadata_row(row, idx=0, ...)

cube_result  = generate_clean_cube(params)
obs_result   = simulate_observation(params, cube_result)
img_result   = image_products(params, obs_result)
export_results(params, cube_result, obs_result, img_result)

Staged Simulation API

The pipeline is split into four composable stages:

Stage Function What it does
1 generate_clean_cube() Build sky cube from skymodel, apply background
2 simulate_observation() Run interferometric + TP simulation, return dirty products
3 image_products() Deconvolve, combine INT+TP, build image cubes
4 export_results() Write cubes, ML shards, parameter summaries to disk

run_simulation() orchestrates all four in sequence.

write_ml_dataset_shard() exports an HDF5 shard (clean cube + dirty cube + dirty visibilities + UV mask + metadata) independently of the main export path.

estimate_simulation_footprint() returns resolved pixel count, channel count, cell size, beam size, and raw output size in GiB — useful for pre-run capacity checks.

Full reference: Simulation docs


Skymodels

Source type Description
point Point source — PSF and CLEAN validation
gaussian 2-D Gaussian — compact extended source
extended TNG-backed realistic extended emission
galaxy-zoo Galaxy Zoo image morphology prior
hubble-100 Hubble Top-100 image morphology prior
molecular Molecular cloud structured emission
diffuse Correlated diffuse emission field

All skymodels accept explicit source_offset_x_arcsec / source_offset_y_arcsec to shift the science target from phase center.

Additive background sky (independent of the main source):

Mode Effect
blank_field_dsfg Faint dusty star-forming galaxies
dusty_diffuse Correlated low-spatial-frequency dusty background
combined Both of the above

Full reference: Skymodels docs


Compute Backends

Select via SimulationParams.compute_backend:

Backend Use case
sync Notebooks, examples, debugging
local Local CPU parallelism
dask Distributed execution, cluster scheduling
slurm HPC job submission
kubernetes Cluster-native environments

Full reference: Compute docs


Metadata and Downloads

Query metadata via TAP

from almasim.services.metadata.tap.service import (
    query_by_science_type,
    InclusionFilters,
    ExclusionFilters,
)

df = query_by_science_type(
    include=InclusionFilters(
        science_keyword=["Galaxies"],
        band=[6, 7],
        public_only=True,
        science_only=True,
    ),
    exclude=ExclusionFilters(solar=True),
)

Download products

from almasim.services.download import resolve_products, run_download_job

products = resolve_products(df["member_ous_uid"].tolist())
run_download_job(products, destination=Path("downloads"), extract_tar=True)

Full reference: Metadata docs · Downloads docs


Backend Service

The FastAPI backend exposes library services over HTTP and drives the Svelte frontend.

Endpoint group Purpose
/api/v1/metadata TAP queries and metadata management
/api/v1/simulation Simulation job submission and status
/api/v1/download Product resolution and download jobs
/api/v1/imaging Deconvolution and combination products
/api/v1/visualizer Output browsing and product inspection
/health Health check
/docs Interactive OpenAPI docs (Swagger UI)

Start locally for development:

cd backend
uv run uvicorn app.main:app --reload --port 8000

Full reference: Frontend docs


Examples

All examples use the sync compute backend and require no running scheduler.

Script Description
examples/query_metadata_cli.py Query TAP, export metadata and product CSVs
examples/download_products_cli.py Resolve and download ALMA products
examples/archive_ms_cli.py Unpack ASDMs and apply calibration
examples/staged_pipeline_cli.py Full pipeline: query → simulate → ML shard
examples/imaging_cli.py Synthetic imaging + iterative deconvolution
# Query metadata for Band 6 galaxy observations
python examples/query_metadata_cli.py \
  --science-keyword Galaxies --band 6 \
  --save-csv examples/output/metadata.csv

# Run a staged simulation from the first metadata row
python examples/staged_pipeline_cli.py \
  --metadata-csv examples/output/metadata.csv \
  --row-idx 0 --project-name demo \
  --ml-shard-path examples/output/demo.h5

# Iterative deconvolution demo
python examples/imaging_cli.py \
  --output-dir examples/output/imaging --cycles 180 --gain 0.12

Notebook equivalents: staged_pipeline_notebook.ipynb · query_metadata_notebook.ipynb · download_products_notebook.ipynb

End-to-end archive pipeline (Marimo)

examples/e2e_archive_pipeline.py is a reactive Marimo notebook that covers the full archive workflow interactively: query ALMA metadata → resolve DataLink products → download → unpack ASDMs → apply calibration.

# Install dev dependencies (includes marimo)
uv sync --group dev

# Interactive editing mode — cells re-run automatically as you edit
marimo edit examples/e2e_archive_pipeline.py

# Read-only app mode — run the pipeline step-by-step via the UI
marimo run examples/e2e_archive_pipeline.py

Steps 4 (unpack) and 5 (calibrate) require CASA tools (Linux x86-64 only):

pip install "almasim[casa]"

The notebook saves query filter presets as .query.json files so they can be reloaded across sessions.


Documentation

Full documentation: michedelliveneri.github.io/ALMASim

Section Topics
Quick Start Installation, first simulation
Simulation Staged API, SimulationParams, outputs
Interferometry UV sampling, baselines, multi-config
Noise PWV-aware noise model
Background Sky Additive astrophysical background
Skymodels Source models reference
Imaging Deconvolution, TP+INT combination
Metadata TAP queries, filters
Downloads Product download workflow
Compute Backends Sync, Dask, Slurm, Kubernetes
Frontend Svelte UI workflows

Build docs locally:

uv sync --group dev
uv run sphinx-build -b html docs/source docs/build/html

Contributing

git clone https://github.com/MicheleDelliVeneri/ALMASim.git
cd ALMASim
uv sync --group dev
uv run pytest --ignore=illustris_python
uv run ruff check .
uv run ruff format .

A release is published automatically when a version tag is pushed:

# 1. Bump version in pyproject.toml and src/almasim/__version__.py
# 2. Commit and tag
git tag v2.1.11
git push origin v2.1.11

The release pipeline then:

  1. Validates that the tag matches pyproject.toml
  2. Runs the full lint + test suite
  3. Publishes wheel and sdist to PyPI via OIDC trusted publisher
  4. Creates a GitHub Release with auto-generated changelog and attached artifacts
  5. Builds and pushes the backend Docker image to GHCR

One-time PyPI setup: register a trusted publisher on PyPI with owner MicheleDelliVeneri, repo ALMASim, workflow release.yml, environment pypi.


License

ALMASim is released under the GNU General Public License v3.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

almasim-2.1.11.tar.gz (28.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

almasim-2.1.11-py3-none-any.whl (274.0 kB view details)

Uploaded Python 3

File details

Details for the file almasim-2.1.11.tar.gz.

File metadata

  • Download URL: almasim-2.1.11.tar.gz
  • Upload date:
  • Size: 28.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for almasim-2.1.11.tar.gz
Algorithm Hash digest
SHA256 5a923049876f83e7986f2e7d814deba0a7e7c33473ff0a9c793877f3fca941e8
MD5 dd9c3c61b4d50d7484b54fa4fbdfa434
BLAKE2b-256 5e5afeb758da84cb89cbc3755e2b672053a453999188064c3e6f4e16c0b36141

See more details on using hashes here.

Provenance

The following attestation bundles were made for almasim-2.1.11.tar.gz:

Publisher: release.yml on MicheleDelliVeneri/ALMASim

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file almasim-2.1.11-py3-none-any.whl.

File metadata

  • Download URL: almasim-2.1.11-py3-none-any.whl
  • Upload date:
  • Size: 274.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for almasim-2.1.11-py3-none-any.whl
Algorithm Hash digest
SHA256 6c46e6aaba0f7f2242ff32b40b9acd7d76b2b04d2d775a9ee206b1085544803c
MD5 d537a8d6476fb78e2fe966df26c59e69
BLAKE2b-256 34e2eb2c724584d25f5c15141b4f2a944f57fe790d41a76827a041f33a779a6d

See more details on using hashes here.

Provenance

The following attestation bundles were made for almasim-2.1.11-py3-none-any.whl:

Publisher: release.yml on MicheleDelliVeneri/ALMASim

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page