Skip to main content

yq in pure Python: jq syntax over YAML, TOML, XML and CSV - no jq binary, no C extension

Project description

pureyq

CI PyPI Python toml-test License: MIT

yq, as a pure Python library. Run jq programs over YAML, TOML, XML, CSV and JSON — no yq binary, no jq binary, no C extension required: if Python runs, pureyq runs. Pyodide/WASM, sandboxes, Lambda, anywhere pip install is all you get.

pip install pureyq
pureyq -i '.spec.replicas = 3' deploy.yaml          # edit YAML in place
pureyq -o json '.services | keys' compose.yaml      # YAML in, JSON out
pureyq '.dependencies' pyproject.toml               # TOML works the same way
import pureyq

pureyq.apply(".spec.replicas = 3", manifest_text)   # text -> text, one call
data = pureyq.load(manifest_text)                   # YAML 1.2 -> Python
pureyq.first(".spec.template.spec.containers[].image", data)

The expression language is jq — the real one, not a dialect: the engine is purejq, which passes 96.2% of jq's own test suite. Everything you know from jq works on your YAML: select, map, group_by, paths, assignment operators, reduce, try/catch, string interpolation, regexes.

Why pureyq

  • No binaries, anywhere. kislyuk/yq needs a jq binary on the system at runtime; mikefarah/yq is a Go binary. In sandboxed or pip-only environments (Pyodide, Lambda layers, locked-down CI images, agent sandboxes) neither is an option. pureyq is plain Python wheels all the way down.

  • Embedding in Python. Transforming a manifest in-process with pureyq.apply() takes a fraction of a millisecond; spawning a yq binary per call costs milliseconds. For agent/automation loops that edit many small configs, in-process wins by an order of magnitude.

  • YAML 1.2 correctness by default. PyYAML-based tools (including kislyuk/yq) speak YAML 1.1, with famous consequences:

    input YAML 1.1 loaders read pureyq (1.2 Core Schema)
    country: NO false (!) "NO"
    version: 010 8 (octal) 10
    time: 1:30 90 (sexagesimal) "1:30"
    date: 2026-06-11 a datetime object "2026-06-11"

    On output, strings that either YAML generation would misread are quoted automatically, so emitted files are safe for downstream 1.1 parsers too. Merge keys (<<:) still work — real-world configs depend on them.

Formats

input output notes
YAML multi-document streams, merge keys, 1.2 Core Schema
JSON jq-identical output via purejq's encoder
TOML 100% of the official toml-test suite (704 cases, vendored, run in CI); datetimes load as ISO strings
XML xmltodict convention: @attr, #text, repeated tags become lists
CSV/TSV header row + typed cells (leading-zero ZIP codes stay strings)

Input format is detected from the file extension (-p to force); output defaults to the input format (-o to convert). pureyq -o json . config.toml and pureyq -o yaml . data.json are complete format converters.

CLI

pureyq [options] '<jq filter>' [files...]

-p FMT   input format: auto|yaml|json|toml|xml|csv|tsv (default: by extension)
-o FMT   output format (default: same as input)
-i       edit files in place (atomic; preserves permissions)
-n -r -j -c -s -e -f --arg --argjson    the flags you know from jq
--indent N    output indentation

Multi-document YAML streams behave like jq input streams: each document is one program run, --slurp collects them into an array, and input/inputs consume the rest.

Benchmarks

Measured with tools/bench.py: M-series MacBook, CPython 3.12, mikefarah yq v4.53.3 (native arm64 binary), kislyuk/yq 3.4.3 over jq 1.8 — median of 7 runs, and every workload's outputs verified equal across all three tools before timing. Reproduce: python tools/bench.py --verify.

Embedded in Python — editing a k8s manifest, per call:

per call
pureyq.apply() (in-process) 0.16 ms
spawning the Go yq binary 5.5 ms

In-process beats shelling out ~34x. For agent/automation loops that touch many configs, this is the number that matters.

Command line, small file — 40-line k8s manifest, startup included:

pureyq yq (Go) kislyuk/yq (jq wrapper)
35 ms 5 ms 43 ms

Command line, big file — 15 MB YAML, 100k objects, end to end:

workload pureyq yq (Go) kislyuk/yq
filter + count 7.2 s 1.0 s 6.4 s
convert to JSON 6.5 s 2.7 s 6.8 s

Where the Go binary wins: big-file throughput, by 2–7x. If you can install binaries and that is your workload, use mikefarah/yq. pureyq's lane is everywhere binaries can't go, in-process embedding, and doing what the jq-wrapper approach does without needing jq. (One caveat on the Go side: its compact-JSON mode -I0 is quadratic on large arrays — converting the same 20k-row file takes it 104 s vs pureyq's 1.2 s — so agents asking for compact JSON from big YAML hit a wall pureyq doesn't have.)

Correctness, measured

  • TOML: the official toml-test suite is vendored in this repo (tests/conformance/toml-test) and runs in CI on every commit: 704/704 of the TOML 1.0 cases pass (209 valid documents match the typed expectations, 495 invalid documents are rejected).
  • jq semantics: inherited from purejq, which vendors jq's own test suite (751/781 passing, every difference documented).
  • YAML 1.2 schema: a directed test set covers the 1.1/1.2 divergences (booleans, octals, sexagesimals, timestamps, .inf/.nan, quoting on output), and the libyaml fast path is asserted to agree with the pure Python fallback on every case.

Limitations (honest ones)

  • Comments and exact formatting are not preserved through an edit, same as kislyuk/yq. (mikefarah/yq can preserve them because its engine operates on the YAML node tree; a jq engine works on values.) Anchors/aliases are resolved on load and not re-emitted.
  • TOML output requires a single object result (that's what a TOML document is); CSV output requires flat rows.
  • When PyYAML carries its libyaml C extension (standard wheels do), pureyq uses it for parsing speed — with a pure Python fallback that behaves identically, asserted by tests. "Pure" means required by, not faster with.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pureyq-0.1.0.tar.gz (24.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pureyq-0.1.0-py3-none-any.whl (18.2 kB view details)

Uploaded Python 3

File details

Details for the file pureyq-0.1.0.tar.gz.

File metadata

  • Download URL: pureyq-0.1.0.tar.gz
  • Upload date:
  • Size: 24.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pureyq-0.1.0.tar.gz
Algorithm Hash digest
SHA256 45d9e3b837af28fb6418491bb4219f150b87ed7ec670021078fcef3703aeb529
MD5 7ccd637074c7ed78074e12be5c35abde
BLAKE2b-256 02883bfb25ce619c00d247923cfae3bd72e74425752dbca91584d3c2fa4307f7

See more details on using hashes here.

Provenance

The following attestation bundles were made for pureyq-0.1.0.tar.gz:

Publisher: release.yml on adam2go/pureyq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pureyq-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pureyq-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pureyq-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5401375cac53c095630252840bbe765d2a2a4c38a1c4aa608f29c98103786e98
MD5 db7592deb89bedc6d647614d89025157
BLAKE2b-256 0205d5b5a6b2e795d2cb7d3ae141814816a68fa02903f44bc39c574288962674

See more details on using hashes here.

Provenance

The following attestation bundles were made for pureyq-0.1.0-py3-none-any.whl:

Publisher: release.yml on adam2go/pureyq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page