Skip to main content

A pure Python implementation of jq

Project description

purejq

CI Python Conformance License: MIT

A pure Python implementation of jq, the command-line JSON processor — in the spirit of gojq (Go) and jaq (Rust).

No C extension. No binary. If Python runs, purejq runs — including Pyodide/WASM, AWS Lambda layers you can't compile in, restricted sandboxes, and anywhere pip install is all you get.

$ echo '{"users":[{"name":"alice","age":30},{"name":"bob","age":25}]}' \
    | purejq '.users[] | select(.age > 26) | .name'
"alice"

Why another jq?

C jq jq PyPI package purejq
needs a compiled binary / wheel yes yes (C bindings) no
runs on Pyodide / WASM no no yes
embeds in Python (call functions, pass dicts) no partially yes
arbitrary-precision integers no no yes

The existing jq PyPI package is excellent when you can ship compiled wheels. purejq is for when you can't — or when you want to read and hack the implementation in one afternoon.

Install

pip install purejq            # nothing but Python
pip install 'purejq[speed]'   # + orjson for faster JSON parsing in the CLI

(Not yet published to PyPI — install from source for now: pip install git+https://github.com/adam2go/purejq)

Usage

CLI (drop-in for common jq usage)

purejq '.foo[] | select(.bar > 2)' data.json
cat data.json | purejq -r '.items[].name'     # raw output
purejq -n 'range(3) | . * 2'                  # null input
purejq -c --arg name alice '{user: $name}'    # compact output, variables

Supported flags: -n -r -j -c -s -e -f --arg --argjson.

Python API

import purejq

# one-shot
purejq.first(".a.b", {"a": {"b": 42}})          # 42
purejq.all_outputs(".[] | . * 2", [1, 2, 3])    # [2, 4, 6]

# compile once, run on many inputs (the fast way)
prog = purejq.compile("[.[] | select(.score > 50)] | length")
for batch in batches:
    print(prog.first(batch))

# results are a lazy iterator — infinite streams are fine
prog = purejq.compile("repeat(. * 2)")
it = prog.run(1)
next(it), next(it), next(it)                    # 2, 4, 8

Conformance: measured, not claimed

purejq is tested against jq's own official test suite (vendored in tests/conformance/): 751 of 781 cases pass (96.2%).

Every remaining failure is listed with its reason in expected_failures.txt and falls in one of these known buckets:

  • module system (import / include / modulemeta) — not implemented yet
  • number representation — Python integers are arbitrary-precision, so 13911860366432393 stays exact instead of rounding like a C double. This is the same deliberate difference gojq made; for AI/data pipelines exactness is usually what you want
  • error-message wording in a handful of edge cases (e.g. Python's JSON parser phrases syntax errors differently)

Implemented and conformance-tested: the full expression language (paths, all assignment operators, reduce/foreach, try/catch, label/break, destructuring with ?// alternatives, string interpolation, all @formats), regex builtins (test/match/capture/scan/sub/gsub via Python re), tostream/fromstream, date builtins, SQL-ish builtins, and jq 1.8 additions (pick, abs, toboolean, trim, have_decnum, …).

Performance

Honest framing: a pure Python jq will not beat the C implementation on raw throughput. The design keeps it in usable territory:

  • compile once, run many — programs compile to Python generator closures; evaluation never re-walks the AST
  • fully lazy streamsfirst(f), limit, and infinite generators cost only what they consume
  • C-speed JSON parsing — input parsing uses Python's C-backed json module (or orjson if installed via purejq[speed]), so the parse-heavy part of typical workloads is not written in Python at all
  • PyPy as the escape hatch — purejq is tested on PyPy in CI; interpreter workloads typically run ~10x faster there

Run the benchmark yourself (compares against the system jq if installed):

python3 tools/bench.py 100000

Reference numbers (M-series MacBook, CPython 3.13, 100k objects): field-access streams ~19 ms, map+aggregate ~46 ms, group_by ~560 ms.

Compatibility

CPython 3.9 – 3.14 and PyPy, enforced by the CI matrix on every push. Zero runtime dependencies.

Architecture

source ──lexer──▶ tokens ──parser──▶ AST (tuples)
                                      │ compile (once)
                                      ▼
                    generator closures: f(value, env) → iterator
                                      │
              path mode: g(value, path, env) → (path, value) pairs
                        (powers path(), del(), and all assignments)
  • lexer.py / parser.py — jq grammar, including string interpolation
  • compiler.py — closure compilation, environments, value & path modes
  • ops.py — jq value semantics: total ordering, arithmetic, path read/write
  • builtins.py — Python-native builtins (regex, sort, math, dates, formats)
  • prelude.py — derived builtins defined in jq itself, mirroring jq's builtin.jq

Contributing

See CONTRIBUTING.md. The short version: make a conformance number go up, and python3 tools/update_expected_failures.py is your scoreboard.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

purejq-0.1.0.tar.gz (36.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

purejq-0.1.0-py3-none-any.whl (35.3 kB view details)

Uploaded Python 3

File details

Details for the file purejq-0.1.0.tar.gz.

File metadata

  • Download URL: purejq-0.1.0.tar.gz
  • Upload date:
  • Size: 36.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for purejq-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6341e14885762c887ba1a5b49576fbf758b2706bf60370aeef40c9e5842f5dc1
MD5 3c1961823adc1e193ebee9ef541e8cb8
BLAKE2b-256 428b6e9598e0451a5945b31e10f6472074a1f42f773bc3d38edb55c740dd2139

See more details on using hashes here.

Provenance

The following attestation bundles were made for purejq-0.1.0.tar.gz:

Publisher: release.yml on adam2go/purejq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file purejq-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: purejq-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 35.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for purejq-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3b00390e067cc9b82ffaa866754732575a8ec3c11fe6d9c62e852c66e545326b
MD5 1dab2047e10d04ee2e5faba14630aae4
BLAKE2b-256 0b1343df518c95b86065016f50d72c93d6227645e0e6bdab41d5111077d8ead0

See more details on using hashes here.

Provenance

The following attestation bundles were made for purejq-0.1.0-py3-none-any.whl:

Publisher: release.yml on adam2go/purejq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page