Skip to main content

Round-trip translator between GNDS XML and JSON

Project description

gndson

Round-trip translator between GNDS XML and a clean, JSON-native representation.

⚠️ Early alpha (0.1.0a1). gndson is a prototype. The JSON encoding, the schema-pipeline names, and the public Python and CLI interfaces may change without notice between minor versions. The XML ↔ JSON round-trip contract (spec.md §9) is the stable commitment; everything else may evolve. Pin a specific version if you depend on it.

gndson lets you work with GNDS files as ordinary JSON — read, edit, diff, search with jq — and round-trip them back to XML without losing anything.

The test corpus used during development is every file of the FENDL 3.2c neutron sub-library that was successfully converted to GNDS XML — 145 files in total (~1.6M XML elements), covering both reactionSuite and covarianceSuite documents. The translator is verified on this corpus at two levels:

  • Spec-equivalence (per spec.md §9): 145/145 (100%)
  • Byte-form-strict (also preserves <x/> vs <x></x>): 145/145 (100%)

Background

GNDS (Generalised Nuclear Database Structure) is the modern XML-based format for evaluated nuclear data, developed under the WPEC EGNDS group and intended as the successor to the long-serving ENDF-6 fixed-column text format. A GNDS document organises reactions, cross sections, distributions, covariances, and metadata in a single hierarchical structure. The current normative specification is:

Mattoon, C., Gert, G., Holcomb, A., Brown, D., Wiarda, D., Chapman, C., Haeck, W., Staley, M. (2025). GNDS-2.1 Specifications. NEA Working Paper NEA/WKP(2025)6, Nuclear Energy Agency, OECD.

gndson tracks this revision; all section references in spec.md and framework.md cite it.

XML is rich and self-describing, but JSON is the lingua franca of modern tooling — every browser, every scripting language, every data-science stack reads JSON natively, every diff tool understands it, every cloud database stores it. Bringing GNDS within reach of that ecosystem is what gndson is for.

gndson is a mechanistic, bijective translator between GNDS XML and JSON. It carries no opinion about what the data should look like — it preserves whatever the source XML expressed and emits JSON that reconstructs the same XML on the round trip. Schema-aware ergonomic transformations live in a separate layer above the bijective core, opt-in by name (see "Schema-aware ergonomic output" below).

Because the translation is mechanical, it does not interfere with the work of the WPEC EGNDS group on the GNDS specification itself. Any future addition to the GNDS XML schema — new elements, new attributes, new structural patterns — is automatically reflected in the JSON representation without changes to gndson, and the schema-aware layer can be extended to recognise new patterns as they are formalised.

Install

The package is pure Python (>=3.7), no external dependencies:

pip install -e .

For development (runs the test suite via pytest):

python -m venv venv
./venv/bin/pip install -e .[test]

CLI

Three subcommands; each reads from stdin and writes to stdout by default.

Translate XML to JSON

gndson xml-to-json file.xml                    # JSON on stdout
gndson xml-to-json file.xml -o file.json       # write to file
gndson xml-to-json file.xml --indent -1        # compact (one line)
cat file.xml | gndson xml-to-json              # stdin

Translate JSON back to XML

gndson json-to-xml file.json -o out.xml

Verify round-trip on a single file

gndson verify file.xml                              # XML round-trip (spec-equivalence)
gndson verify file.xml --strict                     # also require byte-form fidelity
gndson verify file.xml --pipeline ergonomic        # also verify schema-layer round-trip
gndson verify file.xml --pipeline ergonomic_full --strict   # all three checks

verify exits 0 on success, 1 on a round-trip mismatch, 2 on a translator error. When --pipeline NAME is given, the schema-layer check (pipeline.inverse(pipeline.forward(canonical)) == canonical) runs in addition to the XML-layer check.

Compose in a pipe

cat file.xml | gndson xml-to-json | gndson json-to-xml > round.xml

Schema-aware ergonomic output

xml-to-json, json-to-xml, and verify accept --pipeline NAME to apply one of the named schema-layer pipelines (see framework.md). On xml-to-json the pipeline's forward direction runs after parsing; on json-to-xml the inverse runs before serialising; on verify the schema-layer round-trip is checked alongside the XML-layer one.

Available pipelines, smallest to fullest:

Pipeline What it does
canonical identity — no schema transformation
arity always-list discipline for plural containers (reactions/reaction, ...)
uniform arity + collapse {Xs: {X: [...]}} to {Xs: [...]} for uniform-inner
wrappers annotate physicalQuantity wrappers with _kind and collapse them
heterogeneous collapse heterogeneous containers (function1ds, styles, axes, ...) to a flat list with _kind per item
split_text split <values> text into a JSON list of tokens
data_columns parse FUDGE-style <data> header comments into _columns + _rows
ergonomic arity + uniform + wrappers — the recommended default
ergonomic_full ergonomic + heterogeneous
ergonomic_split ergonomic_full + split_text
ergonomic_split_data ergonomic_split + data_columns — the fullest ergonomic form

All pipelines round-trip 145/145 on the bundled GNDS corpus (the split_text-containing pipelines round-trip at the GNDS-spec level — internal whitespace inside <values> bodies is normalised on the inverse).

Per-pipeline documentation with worked before/after examples, witness-flow tables, and inverse instructions lives under docs/pipelines/ — auto-generated by gndson docs --all. CI gating is gndson docs --all --check.

gndson xml-to-json file.xml --pipeline ergonomic              # the recommended default
gndson xml-to-json file.xml --pipeline ergonomic_split_data   # fullest ergonomic form
gndson json-to-xml file.json --pipeline ergonomic_split_data  # inverse: take a JSON
                                                              # produced by the same
                                                              # pipeline back to XML
gndson verify file.xml --pipeline ergonomic_full --strict     # all three checks

python -m gndson <command> ... works identically if you prefer not to install.

Python API

import gndson

# XML -> JSON-shaped dict
data = gndson.parse_xml_file("file.xml")
# or
data = gndson.parse_xml_bytes(open("file.xml", "rb").read())

# JSON-shaped dict -> XML
xml_text = gndson.to_xml_string(data)
# or write directly
gndson.write_xml_file(data, "out.xml")

The returned data is a plain Python dict / list / str tree — there is no wrapper class to learn. Standard json module reads and writes it directly.

Encoding rules at a glance

data = gndson.parse_xml_bytes(
    b'<?xml version="1.0" encoding="UTF-8"?>'
    b'<r a="1"><x>hello</x><x>world</x></r>'
)
# data == {
#   "_xml": {"version": "1.0", "encoding": "UTF-8"},
#   "r": {
#     "@a": "1",                # attributes: prefix '@'
#     "x": ["hello", "world"],  # repeated tag -> list
#   }
# }

Element-encoding rules (see spec.md for the full definition):

XML JSON
<x>hello</x> (text only, no attrs) bare string "hello"
<x a="1"/> (attrs only) {"@a": "1"}
<x>foo</x> once, <x>bar</x> twice "x": "foo" (scalar) or "x": ["foo","bar"] (list) — by count
<![CDATA[...]]> text normal string + parent has _cdata: ["x"]
<!-- comment --> parent has _comments: ["comment"] + _order: [..., "_comment", ...]
<x></x> (explicit empty pair) empty string + parent has _nocollapse: ["x"]

All meta keys are reserved-prefix _ so they cannot collide with GNDS tag names.

Round-trip contract

Translator-equivalence (per spec.md §9): two XML files are equivalent iff they differ only in:

  • inter-tag whitespace
  • self-closing-vs-pair form, modulo _nocollapse
  • attribute order, modulo _attrorder
  • attribute quote character
  • minimal entity escaping

Everything else — text content (byte-exact), CDATA-ness, comments, child order, attributes — is faithfully preserved.

Tests

./venv/bin/pytest                                              # unit tests (~85)
./venv/bin/pytest --gnds-corpus /path/to/gnds/xml/files        # also run corpus
./venv/bin/python tests/test_roundtrip_corpus.py /path/to/dir  # corpus, script mode

The corpus driver reports two pass rates: spec-equivalence and byte-form-strict (see "Round-trip contract" above).

Examples

examples/build_minimal_from_json.py hand-authors a one-reaction GNDS file (n + H-1 elastic, MT=2) as a Python dict, translates it to XML with gndson, and (if FUDGE is importable) reads the result back to confirm the cross section value.

examples/edit_via_json.py shows the "edit nuclear data as JSON" workflow: loads a corpus GNDS file, scales a cross section in JSON-land via ordinary dict indexing, writes the modified XML, and uses FUDGE to confirm the change is visible (σ(1 MeV) = 4.25 b8.49 b after --factor 2.0).

examples/roundtrip_through_fudge.py proves round-trip identity: for one or more input files it reads the ORIGINAL with FUDGE and the gndson-round-tripped version with FUDGE, then compares both toXML() outputs (via gndson's own faithful comparator) AND the cross-section values evaluated at sample energies. FUDGE cannot tell the original from the round-trip.

The first two scripts skip the FUDGE step gracefully when FUDGE isn't importable; pass --skip-fudge to skip it explicitly. The third requires FUDGE (it is the whole point).

Specification

See spec.md for the canonical-form definition, the round-trip contract, and the reasoning behind individual design decisions.

Design principles

See framework.md for the broader operations / witnesses / round-trip-oracle framework that gndson is built on. Useful for deciding whether a new feature belongs in the translator, in a schema-augmentation layer above it, or in a user-facing reduction.

Layout

gndson/
  __init__.py     # public API
  __main__.py     # CLI
  parser.py       # XML -> canonical JSON dict (expat-based)
  serializer.py   # canonical JSON dict -> XML
  entities.py     # pluggable XML entity codec
  errors.py       # exception hierarchy
  _compare.py     # faithful XML comparator for round-trip checks
  _meta.py        # reserved-name constants
tests/
  test_features.py            # unit tests per spec rule
  test_cli.py                 # CLI smoke tests
  test_roundtrip_corpus.py    # corpus-driver round-trip test
spec.md           # the format specification

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gndson-0.1.0a1.tar.gz (79.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gndson-0.1.0a1-py3-none-any.whl (46.2 kB view details)

Uploaded Python 3

File details

Details for the file gndson-0.1.0a1.tar.gz.

File metadata

  • Download URL: gndson-0.1.0a1.tar.gz
  • Upload date:
  • Size: 79.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for gndson-0.1.0a1.tar.gz
Algorithm Hash digest
SHA256 6e7033574203193001dd46d8a70848c87d1ff8e74d0d4f96597510dd08aadc88
MD5 67f3d461860ccf22ada9c625d11d71c5
BLAKE2b-256 6d5d3f65403925c051aae61bef26899b7c17e727f3b708ee1f68fe28764bc4b3

See more details on using hashes here.

File details

Details for the file gndson-0.1.0a1-py3-none-any.whl.

File metadata

  • Download URL: gndson-0.1.0a1-py3-none-any.whl
  • Upload date:
  • Size: 46.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for gndson-0.1.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 5088b0b188d3a51d0960e480183ec058f9639e1f2fae5800398bd966232a2d40
MD5 57510a7714779d957fd3a7289818876e
BLAKE2b-256 0d76728edc22b90fe4ff4174239ac9a95732788f328945f89d4808ddc1f80cad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page