Skip to main content

Machine-readable URL catalog for AEMO NEMWEB (JSON + JSON Schema + Python SDK)

Project description

nem-catalog — Machine-readable URL catalog for AEMO NEMWEB

Catalog Schema PyPI License: MIT Last successful crawl

A versioned JSON catalog + JSON Schema that maps (NEMWEB dataset key, time range) → candidate URLs, covering all four NEMWEB repositories (Reports, MMSDM, NEMDE, FCAS_Causer_Pays). Released under MIT (code) and CC0 (catalog data).

Quick start — no install required

curl -s https://zhipenghe.me/nem-catalog/catalog.json \
  | jq '.datasets["Reports:DispatchIS_Reports"].tiers.ARCHIVE'

Output:

{
  "path_template": "/Reports/ARCHIVE/DispatchIS_Reports/",
  "filename_template": "PUBLIC_DISPATCHIS_{date}.zip",
  "filename_regex": "^PUBLIC_DISPATCHIS_\\d{8}\\.zip$",
  "example": "PUBLIC_DISPATCHIS_20250407.zip",
  "cadence": "daily_rollup"
}

Build the full URL: https://nemweb.com.au + path_template + filename_template (with {date} substituted as yyyymmdd). Placeholder vocabulary is in the catalog's top-level placeholders field.

Stability

v0.1 is experimental. API may change before v1.0. For reproducible research, pin the catalog version:

catalog = nem_catalog.fetch_latest(catalog_version="2026.04.18")

Python usage

pip install nem-catalog
import nem_catalog

# Primary (library-pure, deterministic):
catalog = nem_catalog.load("catalog.json")

urls = catalog.resolve(
    "Reports:DispatchIS_Reports",
    from_="2025-04-01",
    to_="2025-04-02",
)
# → list of candidate URLs. Caller is responsible for reachability.

# Convenience (live fetch + cache + fallback):
catalog = nem_catalog.fetch_latest()

# Preview cardinality before materializing:
n = catalog.count("Reports:DispatchIS_Reports", from_="2024-01-01", to_="2024-12-31")

Expected UserWarning: Reports:* datasets with both an ARCHIVE and a rolling CURRENT tier emit a one-line warning when you query historical (ARCHIVE-era) dates. The SDK is telling you the live tier has no data that old, so it routed to ARCHIVE. The returned URLs are correct.

Not every dataset resolves to concrete URLs in v0.1

Coverage in v0.1: roughly 1 in 6 of the 362 dataset keys resolve cleanly today (~16%, mostly Reports:* ARCHIVE tiers). The remaining ~84% raise NonResolvableTemplateError — including almost all MMSDM:* tables (file-sequence suffix {d2}/{nn}) and every live CURRENT tier (16-digit publish ID {aemo_id}).

Per repo: Reports 53/96 (55%), MMSDM 4/259 (~2%), NEMDE 2/6, FCAS_Causer_Pays 0/1. v0.2 will add list_urls() for the non-temporal cases by reading NEMWEB directory listings.

resolve() only returns URLs when the tier's filename template can be built from a date range alone. AEMO filenames in rolling CURRENT tiers often embed a participant ID (e.g. {aemo_id}) or a file-sequence suffix (e.g. {nn}) that the SDK cannot compute without extra input. For those, resolve() raises NonResolvableTemplateError rather than return a broken URL string.

# Raises NonResolvableTemplateError — CURRENT filename has {aemo_id}
catalog.resolve("Reports:DispatchIS_Reports", from_="2026-04-17", to_="2026-04-18")

# Works — ARCHIVE filename is pure temporal
catalog.resolve("Reports:DispatchIS_Reports", from_="2025-04-01", to_="2025-04-02")

Inspect the raw template for any dataset via catalog.datasets[key]['tiers'] and build the URL yourself, or pin the query to an ARCHIVE-covered date range. A future release will add an enumeration API for these datasets.

Not for you if...

  • You want a pandas DataFrame of NEMWEB data → use NEMOSIS. It's the production-grade Python pipeline for researchers.
  • You want forecast data (pre-dispatch, PASA) → use NEMSEER.
  • You want emissions data → use NEMED.

nem-catalog serves the layer below these tools: a shared metadata + canonical JSON shape describing NEMWEB's URL grammar. Non-Python consumers (R, Julia, shell) can use the JSON directly without installing anything.

Shell cookbook (R/Julia/shell users)

See docs/cookbook.md for recipes including URL expansion, date iteration, and parallel download with xargs.

How it's built

See docs/architecture.md. Briefly: extract_patterns.py mirrors NEMWEB directory listings weekly, derives URL patterns, and a hybrid auto+curated merge produces catalog.json. Weekly GitHub Actions runs the whole pipeline and opens a PR on diffs.

Contributing

See CONTRIBUTING.md.

License

  • Code: MIT. See LICENSE.
  • Catalog JSON: CC0 (public domain).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nem_catalog-0.1.0.tar.gz (77.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nem_catalog-0.1.0-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file nem_catalog-0.1.0.tar.gz.

File metadata

  • Download URL: nem_catalog-0.1.0.tar.gz
  • Upload date:
  • Size: 77.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for nem_catalog-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7ed150c895f0b04fe06f713ea66ebd0a0d65ae7dca9207df29b6cb959580756b
MD5 0b840255265f9ab3f929057ade64b5a9
BLAKE2b-256 6caa619842a81ed519e09092879dffaf331b78357e0a2e2a31826bbcc44e6024

See more details on using hashes here.

Provenance

The following attestation bundles were made for nem_catalog-0.1.0.tar.gz:

Publisher: release.yml on ZhipengHe/nem-catalog

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nem_catalog-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: nem_catalog-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for nem_catalog-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8bcac383d3ce415cce8d15ec5a0bc6aaba4f76ff451199ee324ab439454360aa
MD5 b42fbfcf5c65b64f7adf2a10db549b1d
BLAKE2b-256 4425e0fd6cf81ef0d9df503b4e5780e1fe84d86379d0ba62f097eb1207e44665

See more details on using hashes here.

Provenance

The following attestation bundles were made for nem_catalog-0.1.0-py3-none-any.whl:

Publisher: release.yml on ZhipengHe/nem-catalog

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page