Skip to main content

yq in pure Python: jq syntax over YAML, TOML, XML and CSV - no jq binary, no C extension

Project description

pureyq

CI PyPI Python toml-test License: MIT

yq, as a pure Python library. Run jq programs over YAML, TOML, XML, CSV and JSON — no yq binary, no jq binary, no C extension required: if Python runs, pureyq runs. Pyodide/WASM, sandboxes, Lambda, anywhere pip install is all you get.

pip install pureyq
pureyq -i '.spec.replicas = 3' deploy.yaml          # edit YAML in place
pureyq -o json '.services | keys' compose.yaml      # YAML in, JSON out
pureyq '.dependencies' pyproject.toml               # TOML works the same way
import pureyq

pureyq.apply(".spec.replicas = 3", manifest_text)   # text -> text, one call
data = pureyq.load(manifest_text)                   # YAML 1.2 -> Python
pureyq.first(".spec.template.spec.containers[].image", data)

The expression language is jq — the real one, not a dialect: the engine is purejq, which passes 96.2% of jq's own test suite. Everything you know from jq works on your YAML: select, map, group_by, paths, assignment operators, reduce, try/catch, string interpolation, regexes.

Why pureyq

  • No binaries, anywhere. kislyuk/yq needs a jq binary on the system at runtime; mikefarah/yq is a Go binary. In sandboxed or pip-only environments (Pyodide, Lambda layers, locked-down CI images, agent sandboxes) neither is an option. pureyq is plain Python wheels all the way down.

  • Embedding in Python. Transforming a manifest in-process with pureyq.apply() takes a fraction of a millisecond; spawning a yq binary per call costs milliseconds. For agent/automation loops that edit many small configs, in-process wins by an order of magnitude.

  • YAML 1.2 correctness by default. PyYAML-based tools (including kislyuk/yq) speak YAML 1.1, with famous consequences:

    input YAML 1.1 loaders read pureyq (1.2 Core Schema)
    country: NO false (!) "NO"
    version: 010 8 (octal) 10
    time: 1:30 90 (sexagesimal) "1:30"
    date: 2026-06-11 a datetime object "2026-06-11"

    On output, strings that either YAML generation would misread are quoted automatically, so emitted files are safe for downstream 1.1 parsers too. Merge keys (<<:) still work — real-world configs depend on them.

Formats

input output notes
YAML multi-document streams, merge keys, 1.2 Core Schema
JSON jq-identical output via purejq's encoder
TOML 100% of the official toml-test suite (704 cases, vendored, run in CI); datetimes load as ISO strings
XML xmltodict convention: @attr, #text, repeated tags become lists
CSV/TSV header row + typed cells (leading-zero ZIP codes stay strings)

Input format is detected from the file extension (-p to force); output defaults to the input format (-o to convert). pureyq -o json . config.toml and pureyq -o yaml . data.json are complete format converters.

CLI

pureyq [options] '<jq filter>' [files...]

-p FMT   input format: auto|yaml|json|toml|xml|csv|tsv (default: by extension)
-o FMT   output format (default: same as input)
-i       edit files in place (atomic; preserves permissions)
-n -r -j -c -s -e -f --arg --argjson    the flags you know from jq
--indent N    output indentation

Multi-document YAML streams behave like jq input streams: each document is one program run, --slurp collects them into an array, and input/inputs consume the rest.

Benchmarks

Measured with tools/bench.py: M-series MacBook, CPython 3.12, mikefarah yq v4.53.3 (native arm64 binary), kislyuk/yq 3.4.3 over jq 1.8 — three independent rounds, each the median of 7 runs, and every workload's outputs verified equal across all three tools before timing. The numbers below are cross-round medians. Reproduce: python tools/bench.py --verify.

Embedded in Python — editing a k8s manifest, per call:

per call
pureyq.apply() (in-process) 0.15 ms
spawning the Go yq binary 5.4 ms

In-process beats shelling out ~36x. For agent/automation loops that touch many configs, this is the number that matters.

Command line, small file — 40-line k8s manifest, startup included:

pureyq yq (Go) kislyuk/yq (jq wrapper)
33 ms 6 ms 49 ms

Command line, big file — 15 MB YAML, 100k objects, end to end:

workload pureyq yq (Go) kislyuk/yq
filter + count 5.1 s 1.1 s 6.7 s
convert to JSON 5.9 s 2.8 s 7.1 s

On big files pureyq is 15–25% faster than the jq-wrapper approach, while also removing its jq-binary requirement. Where the Go binary wins: big-file throughput, by 2–5x; if you can install binaries and that is your workload, use mikefarah/yq. (One caveat on the Go side: its compact-JSON mode -I0 is quadratic on large arrays — converting a 20k-row file takes it 104 s vs pureyq's ~1 s — so agents asking for compact JSON from big YAML hit a wall pureyq doesn't have.)

Correctness, measured

  • TOML: the official toml-test suite is vendored in this repo (tests/conformance/toml-test) and runs in CI on every commit: 704/704 of the TOML 1.0 cases pass (209 valid documents match the typed expectations, 495 invalid documents are rejected).
  • jq semantics: inherited from purejq, which vendors jq's own test suite (751/781 passing, every difference documented).
  • YAML 1.2 schema: a directed test set covers the 1.1/1.2 divergences (booleans, octals, sexagesimals, timestamps, .inf/.nan, quoting on output), and the libyaml fast path is asserted to agree with the pure Python fallback on every case.

Limitations (honest ones)

  • Comments and exact formatting are not preserved through an edit, same as kislyuk/yq. (mikefarah/yq can preserve them because its engine operates on the YAML node tree; a jq engine works on values.) Anchors/aliases are resolved on load and not re-emitted.
  • TOML output requires a single object result (that's what a TOML document is); CSV output requires flat rows.
  • When PyYAML carries its libyaml C extension (standard wheels do), pureyq uses it for parsing speed — with a pure Python fallback that behaves identically, asserted by tests. "Pure" means required by, not faster with.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pureyq-0.1.1.tar.gz (24.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pureyq-0.1.1-py3-none-any.whl (18.8 kB view details)

Uploaded Python 3

File details

Details for the file pureyq-0.1.1.tar.gz.

File metadata

  • Download URL: pureyq-0.1.1.tar.gz
  • Upload date:
  • Size: 24.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pureyq-0.1.1.tar.gz
Algorithm Hash digest
SHA256 00921c2f3464bc97de2782666cf080d3c8e26cbab70f64ca6d4d1ffbde95efc0
MD5 2d845b9be3f83ed3262cc73650d49f4a
BLAKE2b-256 97a63f1c9b16d4438a0b0ab4e64c348664e78f995000f2cac3d0085ba0cac06f

See more details on using hashes here.

Provenance

The following attestation bundles were made for pureyq-0.1.1.tar.gz:

Publisher: release.yml on adam2go/pureyq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pureyq-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pureyq-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 18.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pureyq-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 57a8deda6c491dc0ab28525b39ddae3a82f15592e811e4f3e209480a3a5911ad
MD5 bcf340d38c31941190ff55be1d97d177
BLAKE2b-256 c8325671aa3e639d68337e4d1743c34dc2e0cc1bc26ae577372673fa914a8744

See more details on using hashes here.

Provenance

The following attestation bundles were made for pureyq-0.1.1-py3-none-any.whl:

Publisher: release.yml on adam2go/pureyq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page