Machine-readable URL catalog for AEMO NEMWEB (JSON + JSON Schema + Python SDK)

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ZhipengHe

These details have not been verified by PyPI

Project links

Project description

nem-catalog — Machine-readable URL catalog for AEMO NEMWEB

A versioned JSON catalog + JSON Schema that maps (NEMWEB dataset key, time range) → candidate URLs, covering all four NEMWEB repositories (Reports, MMSDM, NEMDE, FCAS_Causer_Pays). Released under MIT (code) and CC0 (catalog data).

Quick start — no install required

curl -s https://zhipenghe.me/nem-catalog/catalog.json \
  | jq '.datasets["Reports:DispatchIS_Reports"].tiers.ARCHIVE'

Output:

{
  "path_template": "/Reports/ARCHIVE/DispatchIS_Reports/",
  "filename_template": "PUBLIC_DISPATCHIS_{date}.zip",
  "filename_regex": "^PUBLIC_DISPATCHIS_\\d{8}\\.zip$",
  "example": "PUBLIC_DISPATCHIS_20250407.zip",
  "cadence": "daily_rollup"
}

Build the full URL: https://nemweb.com.au + path_template + filename_template (with {date} substituted as yyyymmdd). Placeholder vocabulary is in the catalog's top-level placeholders field.

Stability

v0.1 is experimental. API may change before v1.0. For reproducible research, pin the catalog version:

catalog = nem_catalog.fetch_latest(catalog_version="2026.04.18")

Python usage

pip install nem-catalog

import nem_catalog

# Primary (library-pure, deterministic):
catalog = nem_catalog.load("catalog.json")

urls = catalog.resolve(
    "Reports:DispatchIS_Reports",
    from_="2025-04-01",
    to_="2025-04-02",
)
# → list of candidate URLs. Caller is responsible for reachability.

# Convenience (live fetch + cache + fallback):
catalog = nem_catalog.fetch_latest()

# Preview cardinality before materializing:
n = catalog.count("Reports:DispatchIS_Reports", from_="2024-01-01", to_="2024-12-31")

Expected UserWarning: Reports:* datasets with both an ARCHIVE and a rolling CURRENT tier emit a one-line warning when you query historical (ARCHIVE-era) dates. The SDK is telling you the live tier has no data that old, so it routed to ARCHIVE. The returned URLs are correct.

Not every dataset resolves to concrete URLs in v0.1

Coverage in v0.1: roughly 1 in 6 of the 362 dataset keys resolve cleanly today (~16%, mostly Reports:* ARCHIVE tiers). The remaining ~84% raise NonResolvableTemplateError — including almost all MMSDM:* tables (file-sequence suffix {d2}/{nn}) and every live CURRENT tier (16-digit publish ID {aemo_id}).

Per repo: Reports 53/96 (55%), MMSDM 4/259 (~2%), NEMDE 2/6, FCAS_Causer_Pays 0/1. v0.2 will add list_urls() for the non-temporal cases by reading NEMWEB directory listings.

resolve() only returns URLs when the tier's filename template can be built from a date range alone. AEMO filenames in rolling CURRENT tiers often embed a participant ID (e.g. {aemo_id}) or a file-sequence suffix (e.g. {nn}) that the SDK cannot compute without extra input. For those, resolve() raises NonResolvableTemplateError rather than return a broken URL string.

# Raises NonResolvableTemplateError — CURRENT filename has {aemo_id}
catalog.resolve("Reports:DispatchIS_Reports", from_="2026-04-17", to_="2026-04-18")

# Works — ARCHIVE filename is pure temporal
catalog.resolve("Reports:DispatchIS_Reports", from_="2025-04-01", to_="2025-04-02")

Inspect the raw template for any dataset via catalog.datasets[key]['tiers'] and build the URL yourself, or pin the query to an ARCHIVE-covered date range. A future release will add an enumeration API for these datasets.

Not for you if...

You want a pandas DataFrame of NEMWEB data → use NEMOSIS. It's the production-grade Python pipeline for researchers.
You want forecast data (pre-dispatch, PASA) → use NEMSEER.
You want emissions data → use NEMED.

nem-catalog serves the layer below these tools: a shared metadata + canonical JSON shape describing NEMWEB's URL grammar. Non-Python consumers (R, Julia, shell) can use the JSON directly without installing anything.

Shell cookbook (R/Julia/shell users)

See docs/cookbook.md for recipes including URL expansion, date iteration, and parallel download with xargs.

Freshness metadata (v0.1.1+)

The catalog carries optional freshness fields populated by CI at crawl time.

Catalog-level (top-level keys in catalog.json):

Field	Type	Description
`last_crawl_attempted_at`	ISO 8601	When the weekly crawl step started
`last_crawl_completed_at`	ISO 8601	When the crawl step finished (absent means partial crawl — not published)

Per-dataset (inside each datasets[key] entry):

Field	Type	Description
`freshness_class`	`rolling \| append_only \| static \| parent_index \| unclassified`	Policy classification for crawl frequency; `unclassified` is used when a dataset's path does not match any policy rule or when no freshness policy was supplied
`last_observed_change_at`	ISO 8601	Last time the mirror index for this dataset changed (from `git log`, not filesystem mtime)

These fields are absent in catalog snapshots built before v0.1.1 and in the static catalog committed to this repo (which is built offline). They are present in every catalog artifact published by the weekly CI workflow.

How it's built

See docs/architecture.md. Briefly: nemweb_download.py --policy freshness-policy.yaml mirrors NEMWEB directory listings weekly (skipping paths classified as static), extract_patterns.py derives URL patterns, and a hybrid auto+curated merge produces catalog.json. Weekly GitHub Actions runs the whole pipeline and opens a PR on diffs.

Contributing

See CONTRIBUTING.md.

License

Code: MIT. See LICENSE.
Catalog JSON: CC0 (public domain).

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ZhipengHe

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Apr 19, 2026

0.1.0

Apr 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nem_catalog-0.1.1.tar.gz (84.1 kB view details)

Uploaded Apr 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nem_catalog-0.1.1-py3-none-any.whl (13.9 kB view details)

Uploaded Apr 19, 2026 Python 3

File details

Details for the file nem_catalog-0.1.1.tar.gz.

File metadata

Download URL: nem_catalog-0.1.1.tar.gz
Upload date: Apr 19, 2026
Size: 84.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for nem_catalog-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`c6ca1b03d3f76a4965444faab008c9e2c3624ce106ff647ed42b81f9bdfb6904`
MD5	`ae1a3b849b39fe07f8ead90e0020809d`
BLAKE2b-256	`b7890d45d4574483537803473d22f59010b0e8d6b992dbd41b470991106472b3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nem_catalog-0.1.1.tar.gz:

Publisher: release.yml on ZhipengHe/nem-catalog

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nem_catalog-0.1.1.tar.gz
- Subject digest: c6ca1b03d3f76a4965444faab008c9e2c3624ce106ff647ed42b81f9bdfb6904
- Sigstore transparency entry: 1340681024
- Sigstore integration time: Apr 19, 2026
Source repository:
- Permalink: ZhipengHe/nem-catalog@5745b614f59ea7dd264ff94fc35fe1d6e5c31c7b
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/ZhipengHe
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@5745b614f59ea7dd264ff94fc35fe1d6e5c31c7b
- Trigger Event: push

File details

Details for the file nem_catalog-0.1.1-py3-none-any.whl.

File metadata

Download URL: nem_catalog-0.1.1-py3-none-any.whl
Upload date: Apr 19, 2026
Size: 13.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for nem_catalog-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d3b3bebdec17d61beed8ae3ee80aacdb12f034df11048447a0bfdd0463d36a57`
MD5	`905c9d37e44c02882d1ceb7de565e866`
BLAKE2b-256	`58437aa62a1379a6f294a7a5cbd337d08d21df34baecbad8daa61a8c7823aedd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nem_catalog-0.1.1-py3-none-any.whl:

Publisher: release.yml on ZhipengHe/nem-catalog

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nem_catalog-0.1.1-py3-none-any.whl
- Subject digest: d3b3bebdec17d61beed8ae3ee80aacdb12f034df11048447a0bfdd0463d36a57
- Sigstore transparency entry: 1340681033
- Sigstore integration time: Apr 19, 2026
Source repository:
- Permalink: ZhipengHe/nem-catalog@5745b614f59ea7dd264ff94fc35fe1d6e5c31c7b
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/ZhipengHe
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@5745b614f59ea7dd264ff94fc35fe1d6e5c31c7b
- Trigger Event: push

nem-catalog 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

nem-catalog — Machine-readable URL catalog for AEMO NEMWEB

Quick start — no install required

Stability

Python usage

Not every dataset resolves to concrete URLs in v0.1

Not for you if...

Shell cookbook (R/Julia/shell users)

Freshness metadata (v0.1.1+)

How it's built

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance