Python client for the SARAO MeerKAT archive.
Project description
MeerKhive
A Python client for the SARAO MeerKAT archive. MeerKhive
authenticates via PKCE OAuth2, introspects the live GraphQL schema to build queries
dynamically, and returns results as plain Python dicts. It ships with a CLI that writes
NDJSON to stdout so the output is pipeable to jq, grep, and similar tools.
Requirements
- Python ≥ 3.11
- uv
- A SARAO archive account (required at runtime for authentication)
Installation
From source
git clone https://github.com/JSKenyon/MeerKhive.git
cd MeerKhive
uv sync
The CLI entry point is then available inside the project's virtual environment:
source .venv/bin/activate
meerkhive --help
As a dependency in another project
uv add git+https://github.com/JSKenyon/MeerKhive.git
or, with uv pip in an existing environment:
uv pip install git+https://github.com/JSKenyon/MeerKhive.git
Authentication
MeerKhive uses PKCE OAuth2 against the SARAO Keycloak realm. On first use it opens a
browser window for interactive login and saves the resulting tokens to
~/.local/state/meerkhive/tokens.json. Subsequent invocations silently refresh the
access token from that file.
The token file location respects XDG_STATE_HOME if set:
export XDG_STATE_HOME=/custom/state
# tokens will be saved to /custom/state/meerkhive/tokens.json
CLI usage
Basic query
# Fetch the 10 most recent observations, all fields
meerkhive --limit 10
# Select specific fields only
meerkhive --fields CaptureBlockId,StartTime,Band --limit 10
# Exclude noisy fields from the default full selection
meerkhive --exclude-fields products,FileSize --limit 20
Filtering
Filters use --filter key=value syntax and are repeatable. Several keys have special
handling: Band, QA2, and NumFreqChannels accept comma-separated lists; dateRange
and radec values are parsed as JSON. All other keys are passed through as-is:
# L-band observations in January 2024
meerkhive --filter Band=L \
--filter 'dateRange=["2024-01-01T00:00:00.000Z","2024-01-31T23:59:59.999Z"]' \
--limit 50
# Multiple bands at once
meerkhive --filter Band=L,UHF --limit 20
# Free-text search
meerkhive --search "NGC1234" --limit 10
# RA/Dec cone search (JSON value)
meerkhive --filter 'radec={"ra": 83.82, "dec": -5.39}' --limit 10
Sorting
# Most recent observations first
meerkhive --sort StartTime:desc --limit 10
# Sort by multiple columns
meerkhive --sort StartTime:desc --sort CaptureBlockId:asc --limit 10
Introspecting the schema
--show-fields connects to the archive, introspects the live GraphQL schema, and prints
the full selection block that would be used for --fields '*':
meerkhive --show-fields
Piping to jq
All observation records are written to stdout as NDJSON (one JSON object per line), so
they compose naturally with jq:
# Extract just the CaptureBlockId and StartTime from the first 5 results
meerkhive --fields CaptureBlockId,StartTime --limit 5 | jq '{id: .CaptureBlockId, start: .StartTime}'
# Count by band
meerkhive --fields Band --limit 500 | jq -r '.Band' | sort | uniq -c | sort -rn
Internal URLs (SARAO network)
By default, URL-valued fields (e.g. rdb) are rendered as public internet URLs.
On the SARAO internal network, pass --url-format internal to get intranet URLs instead:
meerkhive --url-format internal --limit 5
SSL (development only)
meerkhive --no-verify-ssl --auth-address https://dev.archive.example.com --limit 3
Python API
Synchronous query
from meerkhive import query_archive
records = query_archive(
fields="CaptureBlockId,StartTime,Band",
limit=10,
)
for r in records:
print(r["CaptureBlockId"], r["StartTime"])
Filtering and sorting
from meerkhive import query_archive
records = query_archive(
fields="CaptureBlockId,StartTime",
filters=[
"Band=L",
'dateRange=["2024-01-01T00:00:00.000Z","2024-03-31T23:59:59.999Z"]',
],
sort=["StartTime:desc"],
limit=50,
)
Async query
import asyncio
from meerkhive import query_archive_async
async def main() -> None:
records = await query_archive_async(
fields="CaptureBlockId,Band,IntegrationTime",
filters=["Band=L,UHF"],
sort=["StartTime:desc"],
limit=100,
)
for r in records:
print(r)
asyncio.run(main())
parse_filters reference
parse_filters converts a list of "key=value" strings to the GraphQL filter format.
Special-cased keys:
| Key | Behaviour |
|---|---|
dateRange |
Value is parsed as JSON: a two-element ISO 8601 array (use null for an open end), e.g. '["2024-01-01T00:00:00.000Z", null]' |
radec |
Value is parsed as JSON: '{"ra": 83.82, "dec": -5.39}' |
Band, QA2, NumFreqChannels |
Comma-separated values are split into a list |
| All others | Passed through as-is |
from meerkhive import parse_filters
filters = parse_filters([
"Band=L,UHF",
'dateRange=["2024-01-01T00:00:00.000Z","2024-06-30T23:59:59.999Z"]',
])
# [
# {"field": "Band", "value": ["L", "UHF"]},
# {"field": "dateRange", "value": ["2024-01-01T00:00:00.000Z", "2024-06-30T23:59:59.999Z"]},
# ]
Advanced: custom transport
For full control over the GraphQL session (e.g. adding custom middleware):
from meerkhive import AuthenticatedTransport, KeycloakAuth, build_ssl_context
from gql.client import Client
auth = KeycloakAuth.default()
transport = AuthenticatedTransport(
url="https://archive.sarao.ac.za/graphql",
auth=auth,
ssl_context=build_ssl_context(verify=True),
)
async with Client(transport=transport, fetch_schema_from_transport=True) as session:
# Execute arbitrary GraphQL queries against the archive.
...
Developer setup
# Install all dependencies including dev extras
uv sync --all-groups
# Install pre-commit hooks (ruff check + ruff format)
source .venv/bin/activate
pre-commit install
Running tests
# Fast offline unit tests (no credentials needed)
source .venv/bin/activate && python -m pytest tests/ -v
# Live integration test against the production archive (requires valid tokens)
MEERKHIVE_LIVE_TOKENS=~/.local/state/meerkhive/tokens.json \
python -m pytest tests/test_archive_live.py -m slow -v
Linting and formatting
ruff check .
ruff format .
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file meerkhive-0.0.2.tar.gz.
File metadata
- Download URL: meerkhive-0.0.2.tar.gz
- Upload date:
- Size: 19.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.8.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
84780fb8aa8534840306d0a606a32892976cae07bf4bade391acb9d2ff42ebe8
|
|
| MD5 |
cc6dbc5e35b8911b2e8d7eff486c68f5
|
|
| BLAKE2b-256 |
322754ad10e6784d447cb7131833f6e110a99903cd257254ac46424cdb4e3010
|
File details
Details for the file meerkhive-0.0.2-py3-none-any.whl.
File metadata
- Download URL: meerkhive-0.0.2-py3-none-any.whl
- Upload date:
- Size: 22.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.8.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d9a9c43182f73fd1e46a392ca0c7595c8dfb12c0659bc79426814928ef5ead7d
|
|
| MD5 |
70857f2aed4e2070a97c4d9b10317f88
|
|
| BLAKE2b-256 |
bda05b2ea1cb9fce49e507c4744d20c56199bca338ab912342c1da0142f39c79
|