Skip to main content

Fast streaming reader for AEMO MDFF NEM12 / NEM13 metering files. Zero required dependencies.

Project description

aemo-mdff-reader

CI PyPI Python versions License: MIT

Fast, zero-dependency streaming reader for AEMO NEM12 and NEM13 metering files. Implements AEMO MDFF (Meter Data File Format) v2.6.

  • O(1) memory — iterate through millions of intervals.
  • Pure stdlib core; pandas / PyMySQL are opt-in extras.
  • ~2 M readings/sec on the columnar fast path.
  • Includes an aemo-mdff-reader CLI.

Install

pip install aemo-mdff-reader

# optional extras
pip install aemo-mdff-reader[pandas]   # to_dataframe() / parquet
pip install aemo-mdff-reader[mysql]    # SQL persistence

Use

from aemo_mdff_reader import parse

for r in parse("metering.csv"):
    print(r.nmi, r.interval_start, r.value, r.uom)

Or as a flat CSV / DataFrame:

from aemo_mdff_reader import parse, write_csv, to_dataframe

write_csv(parse("metering.csv"), "out.csv")     # no pandas
df = to_dataframe("metering.csv")                # needs [pandas]

From the command line:

aemo-mdff-reader metering.csv -o out.csv
aemo-mdff-reader metering.csv --validate                       # spec check
aemo-mdff-reader metering.csv --nmi NMI1234567 --start 2024-01-01 --end 2024-01-31
aemo-mdff-reader manual.csv --records accumulations            # NEM13

Working with the data

Each parsed record is a slots-based class with named attributes plus a to_dict() for JSON / dict pipelines:

for r in parse("metering.csv"):
    payload = r.to_dict()              # {"nmi": "...", "value": 0.12, ...}
    print(r.quality_flag, r.method_flag)  # split of the QMM field, e.g. "S", "52"

For aggregation, aemo_mdff_reader.aggregate provides streaming helpers:

from aemo_mdff_reader import parse
from aemo_mdff_reader.aggregate import group_by_nmi, daily_totals

for key, group in group_by_nmi(parse("metering.csv")):
    # key = ChannelKey(nmi, register_id, nmi_suffix)
    intervals = list(group)

for day in daily_totals(parse("metering.csv")):
    # day.total, day.interval_count, day.unique_quality_flags
    print(day.nmi, day.interval_date.date(), day.total, day.uom)

End-to-end recipes — load + inspect, daily roll-up, filter to pandas, spec validation — live in examples/.

API at a glance

You want Call
300 interval readings (NEM12) parse(src)
250 accumulations (NEM13) parse_accumulations(src)
Both, in file order parse_all(src)
400 quality / event flags parse_events(src)
500 / 550 B2B transactions parse_b2b(src)
Just the 100 header parse_header(src)
Build a pandas DataFrame to_dataframe(src)
Write a flat CSV (no pandas) write_csv(rows, out)
Validate against AEMO MDFF v2.6 validate_file(src)
Compute / verify an NMI checksum nmi_checksum, validate_nmi
Group readings by NMI / channel aggregate.group_by_nmi(rows)
Roll up to daily totals aggregate.daily_totals(rows)
Convert any record to a plain dict r.to_dict()

src can be a path, a file-like object, an iterable of CSV lines, or an iterable of pre-split rows. The v1 NEMReader facade (read_from_file, to_dataframe, to_csv) still works.

Each parse(...) yields an IntervalReading with nmi, meter_serial_number, register_id, nmi_suffix, uom, interval_length, interval_date, interval_start, interval_end, interval_index, value, quality_method, reason_code, reason_description, update_datetime, msats_load_datetime. See the type stubs (from aemo_mdff_reader import IntervalReading) for the exact signatures.

Notes

  • Spec: AEMO Meter Data File Format Specification NEM12 & NEM13, v2.6 (effective 29 September 2024). Records 100, 200, 250, 300, 400, 500, 550, 900 are all surfaced; unknown indicators are ignored. Allowed values for quality flags, transaction codes, reason codes, units of measure, and direction indicators are exposed as constants in aemo_mdff_reader.spec for callers that want stricter validation than the parser performs.
  • Tolerant: UTF-8 BOM is consumed silently, LF and CRLF both work, and empty interval cells are coerced to 0.0 (use quality_method to distinguish missing from zero). Datetime fields accept the spec forms (YYYYMMDD, YYYYMMDDhhmmss) and a few common non-spec variants (YYYY-MM-DD, ISO YYYY-MM-DDTHH:MM:SS, with or without a Z / ±HH:MM / ±HHMM timezone suffix — the suffix is stripped and parsed datetimes are returned naive). direction_indicator on 250 records passes through whatever the file emits; the spec set is spec.DIRECTION_INDICATORS = {"I", "E"} but B and N appear in the wild. The parser also accepts non-spec IntervalLengths (1, 60, etc.); strict callers should compare against spec.ALLOWED_INTERVAL_LENGTHS (= {5, 15, 30}).
  • Migration from v1: NEMReader still works. The internal aemo_mdff_reader.nemstructure package is gone — see the API table above. pandas is now opt-in. See CHANGELOG.md for details.

Performance

420,480 readings (4 NMIs × 365 days × 5-min, 2.8 MiB CSV), Python 3.11:

Operation Time
for r in parse(path): ... 0.45 s
parse_to_columns(path) 0.21 s
to_dataframe(path) (pandas) 0.76 s

~2.7× faster than v1 end-to-end; reproduce with python benchmarks/bench_parser.py.

Large files

The parser is built to scale to gigabyte-class NEM12 files without loading them into RAM. Measured peak memory delta on a synthetic 10.5 M-reading file (100 NMIs × 365 days × 5-min, 71 MiB CSV), Python 3.12:

API Memory profile Peak Δ
for r in parse(path): ... streaming 1.3 MiB
daily_totals(parse(path)) streaming 0 MiB
write_csv(parse(path), out) streaming 0 MiB
iter_dataframes(path, chunk_size=N) bounded O(N) ~30 MiB / 100k
iter_columns_chunks(path, chunk_size=N) bounded O(N) ~10 MiB / 100k
parse_to_columns(path) full materialise ~600 MiB
list(parse(path)) / to_dataframe(path) / NEMReader.read_from_file() full materialise ~2.5 GiB

Rule of thumb: stay on the streaming or chunked APIs for any file larger than a few hundred MiB. The chunked variants make pandas-based workflows safe on arbitrarily large inputs:

from aemo_mdff_reader import iter_dataframes

# Process a multi-GiB file 50,000 readings at a time.
for df in iter_dataframes("huge.csv", chunk_size=50_000):
    daily = df.groupby(["NMI", "IntervalDate"])["Value"].sum()
    daily.to_csv("out.csv", mode="a", header=False)

The NEMReader facade and to_dataframe(path) materialise their inputs by design (so len(reader) and df.iloc[...] work). Avoid them for files that won't fit in RAM.

Development

git clone https://github.com/Utilified/aemo-mdff-reader.git
cd aemo-mdff-reader
pip install -e .[dev]
pytest

CI runs ruff, mypy --strict, the test matrix on Python 3.9 → 3.12 / Linux / macOS / Windows, pip-audit, bandit, and a wheel-install smoke test. Releases are signed with sigstore and published to PyPI via Trusted Publishing on tag.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aemo_mdff_reader-2.0.2.tar.gz (55.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aemo_mdff_reader-2.0.2-py3-none-any.whl (35.4 kB view details)

Uploaded Python 3

File details

Details for the file aemo_mdff_reader-2.0.2.tar.gz.

File metadata

  • Download URL: aemo_mdff_reader-2.0.2.tar.gz
  • Upload date:
  • Size: 55.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for aemo_mdff_reader-2.0.2.tar.gz
Algorithm Hash digest
SHA256 9e0a640cb55392e92762aae6cd9892641db0d8434cd27ae98b69aa58d124c252
MD5 bebee6bf65f63cfae5ac5565e496b8a9
BLAKE2b-256 8c3867faf7b3f23dfe783b2572364b4068b8c78702e0f6fec6c8c25b02f0afd6

See more details on using hashes here.

Provenance

The following attestation bundles were made for aemo_mdff_reader-2.0.2.tar.gz:

Publisher: release.yml on Utilified/aemo-mdff-reader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file aemo_mdff_reader-2.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for aemo_mdff_reader-2.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 27f88da94f0e22fd1b492429fc86086883dcbc118c90bd26826273b643bd937d
MD5 36d9e2678ff66117b70dbb2c4cd299bd
BLAKE2b-256 d70062faa4c040e3216e4cf2527216fefbf1c8427bb2d01464fb3e3e78b41858

See more details on using hashes here.

Provenance

The following attestation bundles were made for aemo_mdff_reader-2.0.2-py3-none-any.whl:

Publisher: release.yml on Utilified/aemo-mdff-reader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page