Skip to main content

Fast streaming reader for AEMO MDFF NEM12 / NEM13 metering files. Zero required dependencies.

Project description

aemo-mdff-reader

CI PyPI Python versions License: MIT CodeQL OpenSSF Scorecard

Fast, zero-dependency streaming reader for AEMO NEM12 and NEM13 metering files. Implements AEMO MDFF (Meter Data File Format) v2.6.

  • O(1) memory — iterate through millions of intervals.
  • Pure stdlib core; pandas / PyMySQL are opt-in extras.
  • ~2 M readings/sec on the columnar fast path.
  • Includes an aemo-mdff-reader CLI.

Install

pip install aemo-mdff-reader

# optional extras
pip install aemo-mdff-reader[pandas]   # to_dataframe() / parquet
pip install aemo-mdff-reader[mysql]    # SQL persistence

Use

from aemo_mdff_reader import parse

for r in parse("metering.csv"):
    print(r.nmi, r.interval_start, r.value, r.uom)

Or as a flat CSV / DataFrame:

from aemo_mdff_reader import parse, write_csv, to_dataframe

write_csv(parse("metering.csv"), "out.csv")     # no pandas
df = to_dataframe("metering.csv")                # needs [pandas]

From the command line:

aemo-mdff-reader metering.csv -o out.csv
aemo-mdff-reader metering.csv --validate                       # spec check
aemo-mdff-reader metering.csv --nmi NMI1234567 --start 2024-01-01 --end 2024-01-31
aemo-mdff-reader manual.csv --records accumulations            # NEM13

Working with the data

Each parsed record is a slots-based class with named attributes plus a to_dict() for JSON / dict pipelines:

for r in parse("metering.csv"):
    payload = r.to_dict()              # {"nmi": "...", "value": 0.12, ...}
    print(r.quality_flag, r.method_flag)  # split of the QMM field, e.g. "S", "52"

For aggregation, aemo_mdff_reader.aggregate provides streaming helpers:

from aemo_mdff_reader import parse
from aemo_mdff_reader.aggregate import group_by_nmi, daily_totals

for key, group in group_by_nmi(parse("metering.csv")):
    # key = ChannelKey(nmi, register_id, nmi_suffix)
    intervals = list(group)

for day in daily_totals(parse("metering.csv")):
    # day.total, day.interval_count, day.unique_quality_flags
    print(day.nmi, day.interval_date.date(), day.total, day.uom)

End-to-end recipes — load + inspect, daily roll-up, filter to pandas, spec validation — live in examples/.

API at a glance

You want Call
300 interval readings (NEM12) parse(src)
250 accumulations (NEM13) parse_accumulations(src)
Both, in file order parse_all(src)
400 quality / event flags parse_events(src)
500 / 550 B2B transactions parse_b2b(src)
Just the 100 header parse_header(src)
Build a pandas DataFrame to_dataframe(src)
Write a flat CSV (no pandas) write_csv(rows, out)
Validate against AEMO MDFF v2.6 validate_file(src)
Compute / verify an NMI checksum nmi_checksum, validate_nmi
Group readings by NMI / channel aggregate.group_by_nmi(rows)
Roll up to daily totals aggregate.daily_totals(rows)
Convert any record to a plain dict r.to_dict()

src can be a path, a file-like object, an iterable of CSV lines, or an iterable of pre-split rows. The v1 NEMReader facade (read_from_file, to_dataframe, to_csv) still works.

Each parse(...) yields an IntervalReading with nmi, meter_serial_number, register_id, nmi_suffix, uom, interval_length, interval_date, interval_start, interval_end, interval_index, value, quality_method, reason_code, reason_description, update_datetime, msats_load_datetime. See the type stubs (from aemo_mdff_reader import IntervalReading) for the exact signatures.

Notes

  • Spec: AEMO Meter Data File Format Specification NEM12 & NEM13, v2.6 (effective 29 September 2024). Records 100, 200, 250, 300, 400, 500, 550, 900 are all surfaced; unknown indicators are ignored. Allowed values for quality flags, transaction codes, reason codes, units of measure, and direction indicators are exposed as constants in aemo_mdff_reader.spec for callers that want stricter validation than the parser performs.
  • Tolerant: UTF-8 BOM is consumed silently, LF and CRLF both work, and empty interval cells are coerced to 0.0 (use quality_method to distinguish missing from zero). Datetime fields accept the spec forms (YYYYMMDD, YYYYMMDDhhmmss) and a few common non-spec variants (YYYY-MM-DD, ISO YYYY-MM-DDTHH:MM:SS, with or without a Z / ±HH:MM / ±HHMM timezone suffix — the suffix is stripped and parsed datetimes are returned naive). direction_indicator on 250 records passes through whatever the file emits; the spec set is spec.DIRECTION_INDICATORS = {"I", "E"} but B and N appear in the wild. The parser also accepts non-spec IntervalLengths (1, 60, etc.); strict callers should compare against spec.ALLOWED_INTERVAL_LENGTHS (= {5, 15, 30}).
  • Migration from v1: NEMReader still works. The internal aemo_mdff_reader.nemstructure package is gone — see the API table above. pandas is now opt-in. See CHANGELOG.md for details.

Performance

420,480 readings (4 NMIs × 365 days × 5-min, 2.8 MiB CSV), Python 3.11:

Operation Time
for r in parse(path): ... 0.45 s
parse_to_columns(path) 0.21 s
to_dataframe(path) (pandas) 0.76 s

~2.7× faster than v1 end-to-end; reproduce with python benchmarks/bench_parser.py.

Large files

The parser is built to scale to gigabyte-class NEM12 files without loading them into RAM. Measured peak memory delta on a synthetic 10.5 M-reading file (100 NMIs × 365 days × 5-min, 71 MiB CSV), Python 3.12:

API Memory profile Peak Δ
for r in parse(path): ... streaming 1.3 MiB
daily_totals(parse(path)) streaming 0 MiB
write_csv(parse(path), out) streaming 0 MiB
iter_dataframes(path, chunk_size=N) bounded O(N) ~30 MiB / 100k
iter_columns_chunks(path, chunk_size=N) bounded O(N) ~10 MiB / 100k
parse_to_columns(path) full materialise ~600 MiB
list(parse(path)) / to_dataframe(path) / NEMReader.read_from_file() full materialise ~2.5 GiB

Rule of thumb: stay on the streaming or chunked APIs for any file larger than a few hundred MiB. The chunked variants make pandas-based workflows safe on arbitrarily large inputs:

from aemo_mdff_reader import iter_dataframes

# Process a multi-GiB file 50,000 readings at a time.
for df in iter_dataframes("huge.csv", chunk_size=50_000):
    daily = df.groupby(["NMI", "IntervalDate"])["Value"].sum()
    daily.to_csv("out.csv", mode="a", header=False)

The NEMReader facade and to_dataframe(path) materialise their inputs by design (so len(reader) and df.iloc[...] work). Avoid them for files that won't fit in RAM.

Development

git clone https://github.com/Utilified/aemo-mdff-reader.git
cd aemo-mdff-reader
pip install -e .[dev]
pytest

CI runs ruff, mypy --strict, the test matrix on Python 3.11 → 3.12 / Linux / macOS / Windows, pip-audit, bandit, CodeQL, OpenSSF Scorecard, and a wheel-install smoke test.

Releases are automated by release-please from Conventional Commits on main, then signed with sigstore, attested with SLSA build provenance and a CycloneDX SBOM, and published to PyPI via Trusted Publishing. See CONTRIBUTING.md for the contributor commit conventions and the full release flow.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aemo_mdff_reader-2.2.0.tar.gz (58.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aemo_mdff_reader-2.2.0-py3-none-any.whl (35.7 kB view details)

Uploaded Python 3

File details

Details for the file aemo_mdff_reader-2.2.0.tar.gz.

File metadata

  • Download URL: aemo_mdff_reader-2.2.0.tar.gz
  • Upload date:
  • Size: 58.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for aemo_mdff_reader-2.2.0.tar.gz
Algorithm Hash digest
SHA256 a43d8c3ee0e68c3e2cc31a56278cd80d2b7522d6db5d6045f525b7226d8f4689
MD5 85d1882b346a48528fb503c5a2cb2254
BLAKE2b-256 364cb4409a47cbc76e9c380b25f83f4e9c1003248e6fa548e73481f463172e65

See more details on using hashes here.

Provenance

The following attestation bundles were made for aemo_mdff_reader-2.2.0.tar.gz:

Publisher: release.yml on Utilified/aemo-mdff-reader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file aemo_mdff_reader-2.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for aemo_mdff_reader-2.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8d54685e04acc010a26aa333c6fa80f0e9813baa870188a8ae87960e155e43e9
MD5 aa54d31cddf3cecda90f2e51e74462fd
BLAKE2b-256 7e475b4d41596394feab76f9ebe78c21d09dfbf998486681bd67accb4332f274

See more details on using hashes here.

Provenance

The following attestation bundles were made for aemo_mdff_reader-2.2.0-py3-none-any.whl:

Publisher: release.yml on Utilified/aemo-mdff-reader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page