Skip to main content

Python client for the SARAO MeerKAT archive.

Project description

MeerKhive

A Python client for the SARAO MeerKAT archive. MeerKhive authenticates via PKCE OAuth2, introspects the live GraphQL schema to build queries dynamically, and returns results as plain Python dicts. It ships with a CLI that writes NDJSON to stdout so the output is pipeable to jq, grep, and similar tools.

Requirements

  • Python ≥ 3.11
  • uv
  • A SARAO archive account (required at runtime for authentication)

Installation

From source

git clone https://github.com/JSKenyon/MeerKhive.git
cd MeerKhive
uv sync

The CLI entry point is then available inside the project's virtual environment:

source .venv/bin/activate
meerkhive --help

As a dependency in another project

uv add git+https://github.com/JSKenyon/MeerKhive.git

or, with uv pip in an existing environment:

uv pip install git+https://github.com/JSKenyon/MeerKhive.git

Authentication

MeerKhive uses PKCE OAuth2 against the SARAO Keycloak realm. On first use it opens a browser window for interactive login and saves the resulting tokens to ~/.local/state/meerkhive/tokens.json. Subsequent invocations silently refresh the access token from that file.

The token file location respects XDG_STATE_HOME if set:

export XDG_STATE_HOME=/custom/state
# tokens will be saved to /custom/state/meerkhive/tokens.json

CLI usage

Basic query

# Fetch the 10 most recent observations, all fields
meerkhive --limit 10

# Select specific fields only
meerkhive --fields CaptureBlockId,StartTime,Band --limit 10

# Exclude noisy fields from the default full selection
meerkhive --exclude-fields products,FileSize --limit 20

Filtering

Filters use --filter key=value syntax and are repeatable. Several keys have special handling: Band, QA2, and NumFreqChannels accept comma-separated lists; dateRange and radec values are parsed as JSON. All other keys are passed through as-is:

# L-band observations in January 2024
meerkhive --filter Band=L \
  --filter 'dateRange=["2024-01-01T00:00:00.000Z","2024-01-31T23:59:59.999Z"]' \
  --limit 50

# Multiple bands at once
meerkhive --filter Band=L,UHF --limit 20

# Free-text search
meerkhive --search "NGC1234" --limit 10

# RA/Dec cone search (JSON value)
meerkhive --filter 'radec={"ra": 83.82, "dec": -5.39}' --limit 10

Sorting

# Most recent observations first
meerkhive --sort StartTime:desc --limit 10

# Sort by multiple columns
meerkhive --sort StartTime:desc --sort CaptureBlockId:asc --limit 10

Introspecting the schema

--show-fields connects to the archive, introspects the live GraphQL schema, and prints the full selection block that would be used for --fields '*':

meerkhive --show-fields

Piping to jq

All observation records are written to stdout as NDJSON (one JSON object per line), so they compose naturally with jq:

# Extract just the CaptureBlockId and StartTime from the first 5 results
meerkhive --fields CaptureBlockId,StartTime --limit 5 | jq '{id: .CaptureBlockId, start: .StartTime}'

# Count by band
meerkhive --fields Band --limit 500 | jq -r '.Band' | sort | uniq -c | sort -rn

Internal URLs (SARAO network)

By default, URL-valued fields (e.g. rdb) are rendered as public internet URLs. On the SARAO internal network, pass --url-format internal to get intranet URLs instead:

meerkhive --url-format internal --limit 5

SSL (development only)

meerkhive --no-verify-ssl --auth-address https://dev.archive.example.com --limit 3

Python API

Synchronous query

from meerkhive import query_archive

records = query_archive(
    fields="CaptureBlockId,StartTime,Band",
    limit=10,
)
for r in records:
    print(r["CaptureBlockId"], r["StartTime"])

Filtering and sorting

from meerkhive import query_archive

records = query_archive(
    fields="CaptureBlockId,StartTime",
    filters=[
        "Band=L",
        'dateRange=["2024-01-01T00:00:00.000Z","2024-03-31T23:59:59.999Z"]',
    ],
    sort=["StartTime:desc"],
    limit=50,
)

Async query

import asyncio
from meerkhive import query_archive_async

async def main() -> None:
    records = await query_archive_async(
        fields="CaptureBlockId,Band,IntegrationTime",
        filters=["Band=L,UHF"],
        sort=["StartTime:desc"],
        limit=100,
    )
    for r in records:
        print(r)

asyncio.run(main())

parse_filters reference

parse_filters converts a list of "key=value" strings to the GraphQL filter format. Special-cased keys:

Key Behaviour
dateRange Value is parsed as JSON: a two-element ISO 8601 array (use null for an open end), e.g. '["2024-01-01T00:00:00.000Z", null]'
radec Value is parsed as JSON: '{"ra": 83.82, "dec": -5.39}'
Band, QA2, NumFreqChannels Comma-separated values are split into a list
All others Passed through as-is
from meerkhive import parse_filters

filters = parse_filters([
    "Band=L,UHF",
    'dateRange=["2024-01-01T00:00:00.000Z","2024-06-30T23:59:59.999Z"]',
])
# [
#   {"field": "Band", "value": ["L", "UHF"]},
#   {"field": "dateRange", "value": ["2024-01-01T00:00:00.000Z", "2024-06-30T23:59:59.999Z"]},
# ]

Advanced: custom transport

For full control over the GraphQL session (e.g. adding custom middleware):

from meerkhive import AuthenticatedTransport, KeycloakAuth, build_ssl_context
from gql.client import Client

auth = KeycloakAuth.default()
transport = AuthenticatedTransport(
    url="https://archive.sarao.ac.za/graphql",
    auth=auth,
    ssl_context=build_ssl_context(verify=True),
)

async with Client(transport=transport, fetch_schema_from_transport=True) as session:
    # Execute arbitrary GraphQL queries against the archive.
    ...

Developer setup

# Install all dependencies including dev extras
uv sync --all-groups

# Install pre-commit hooks (ruff check + ruff format)
source .venv/bin/activate
pre-commit install

Running tests

# Fast offline unit tests (no credentials needed)
source .venv/bin/activate && python -m pytest tests/ -v

# Live integration test against the production archive (requires valid tokens)
MEERKHIVE_LIVE_TOKENS=~/.local/state/meerkhive/tokens.json \
  python -m pytest tests/test_archive_live.py -m slow -v

Linting and formatting

ruff check .
ruff format .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

meerkhive-0.0.3.tar.gz (19.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

meerkhive-0.0.3-py3-none-any.whl (22.5 kB view details)

Uploaded Python 3

File details

Details for the file meerkhive-0.0.3.tar.gz.

File metadata

  • Download URL: meerkhive-0.0.3.tar.gz
  • Upload date:
  • Size: 19.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.8.6

File hashes

Hashes for meerkhive-0.0.3.tar.gz
Algorithm Hash digest
SHA256 c524b4e8352a2af98f6b36752afa1c699a9f56cfdf0034fa0feeacbbe2c95130
MD5 df98dee2bd9ef63cf8e7a77f16a8a9e8
BLAKE2b-256 b4487c5267ba83aa2f6f2b42bd3093f48911fa70321c1a753545f99c647f64fd

See more details on using hashes here.

File details

Details for the file meerkhive-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: meerkhive-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 22.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.8.6

File hashes

Hashes for meerkhive-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c45abc05d2eeae122525cfce9fffef1b0ebf43dc1382c385c56d34143d6acc8b
MD5 29dc15813158355999d9af4dcca5d427
BLAKE2b-256 994e7d3241480df57bdb6335f1b38e76d0758d8ea64fc3258c130bad3d305a64

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page