Skip to main content

A pure Python implementation of jq

Project description

purejq

CI Python Conformance License: MIT

A pure Python implementation of jq, the command-line JSON processor — in the spirit of gojq (Go) and jaq (Rust).

No C extension. No binary. If Python runs, purejq runs — including Pyodide/WASM, AWS Lambda layers you can't compile in, restricted sandboxes, and anywhere pip install is all you get.

$ echo '{"users":[{"name":"alice","age":30},{"name":"bob","age":25}]}' \
    | purejq '.users[] | select(.age > 26) | .name'
"alice"

Why another jq?

C jq jq PyPI package purejq
needs a compiled binary / wheel yes yes (C bindings) no
runs on Pyodide / WASM no no yes
embeds in Python (call functions, pass dicts) no partially yes
arbitrary-precision integers no no yes

The existing jq PyPI package is excellent when you can ship compiled wheels. purejq is for when you can't — or when you want to read and hack the implementation in one afternoon.

Install

pip install purejq            # nothing but Python
pip install 'purejq[speed]'   # + orjson for faster JSON parsing in the CLI

(Not yet published to PyPI — install from source for now: pip install git+https://github.com/adam2go/purejq)

Usage

CLI (drop-in for common jq usage)

purejq '.foo[] | select(.bar > 2)' data.json
cat data.json | purejq -r '.items[].name'     # raw output
purejq -n 'range(3) | . * 2'                  # null input
purejq -c --arg name alice '{user: $name}'    # compact output, variables

Supported flags: -n -r -j -c -s -e -f --arg --argjson.

Python API

import purejq

# one-shot
purejq.first(".a.b", {"a": {"b": 42}})          # 42
purejq.all_outputs(".[] | . * 2", [1, 2, 3])    # [2, 4, 6]

# compile once, run on many inputs (the fast way)
prog = purejq.compile("[.[] | select(.score > 50)] | length")
for batch in batches:
    print(prog.first(batch))

# results are a lazy iterator — infinite streams are fine
prog = purejq.compile("repeat(. * 2)")
it = prog.run(1)
next(it), next(it), next(it)                    # 2, 4, 8

Conformance: measured, not claimed

purejq is tested against jq's own official test suite (vendored in tests/conformance/): 751 of 781 cases pass (96.2%).

Every remaining failure is listed with its reason in expected_failures.txt and falls in one of these known buckets:

  • module system (import / include / modulemeta) — not implemented yet
  • number representation — Python integers are arbitrary-precision, so 13911860366432393 stays exact instead of rounding like a C double. This is the same deliberate difference gojq made; for AI/data pipelines exactness is usually what you want
  • error-message wording in a handful of edge cases (e.g. Python's JSON parser phrases syntax errors differently)

Implemented and conformance-tested: the full expression language (paths, all assignment operators, reduce/foreach, try/catch, label/break, destructuring with ?// alternatives, string interpolation, all @formats), regex builtins (test/match/capture/scan/sub/gsub via Python re), tostream/fromstream, date builtins, SQL-ish builtins, and jq 1.8 additions (pick, abs, toboolean, trim, have_decnum, …).

Performance

Honest framing: a pure Python jq will not beat the C implementation on raw throughput. The design keeps it in usable territory:

  • compile once, run many — programs compile to Python generator closures; evaluation never re-walks the AST
  • fully lazy streamsfirst(f), limit, and infinite generators cost only what they consume
  • C-speed JSON parsing — input parsing uses Python's C-backed json module (or orjson if installed via purejq[speed]), so the parse-heavy part of typical workloads is not written in Python at all
  • PyPy as the escape hatch — purejq is tested on PyPy in CI; interpreter workloads typically run ~10x faster there

Run the benchmark yourself (compares against the system jq if installed):

python3 tools/bench.py 100000

Reference numbers (M-series MacBook, CPython 3.13, 100k objects): field-access streams ~11 ms, map+aggregate ~24 ms, group_by ~120 ms, transform+sort ~220 ms.

Compatibility

CPython 3.9 – 3.14 and PyPy, enforced by the CI matrix on every push. Zero runtime dependencies.

Architecture

source ──lexer──▶ tokens ──parser──▶ AST (tuples)
                                      │ compile (once)
                                      ▼
                    generator closures: f(value, env) → iterator
                                      │
              path mode: g(value, path, env) → (path, value) pairs
                        (powers path(), del(), and all assignments)
  • lexer.py / parser.py — jq grammar, including string interpolation
  • compiler.py — closure compilation, environments, value & path modes
  • ops.py — jq value semantics: total ordering, arithmetic, path read/write
  • builtins.py — Python-native builtins (regex, sort, math, dates, formats)
  • prelude.py — derived builtins defined in jq itself, mirroring jq's builtin.jq

Contributing

See CONTRIBUTING.md. The short version: make a conformance number go up, and python3 tools/update_expected_failures.py is your scoreboard.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

purejq-0.1.1.tar.gz (37.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

purejq-0.1.1-py3-none-any.whl (36.2 kB view details)

Uploaded Python 3

File details

Details for the file purejq-0.1.1.tar.gz.

File metadata

  • Download URL: purejq-0.1.1.tar.gz
  • Upload date:
  • Size: 37.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for purejq-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c7c2a53c80ed7366d1f18f6b187a0907817b1a124883cdcbbf93904eb47fe2cb
MD5 9cb04c62aff40af969b3eb50884f1ab4
BLAKE2b-256 4da92dd270e1cb3291ec269c49efd4e60572180f102faa530068a218ddb76ad6

See more details on using hashes here.

Provenance

The following attestation bundles were made for purejq-0.1.1.tar.gz:

Publisher: release.yml on adam2go/purejq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file purejq-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: purejq-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 36.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for purejq-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b785f5b9f38f7a059157bd39fddf8f2688902e418d5d2e9c7036e57c83170353
MD5 8d021f9cdaa178f9be7478c1b00b84dd
BLAKE2b-256 b92e3cb56c75de5c36e1b398d30fd7e614d2afa063f27ed89648ceb501c300a1

See more details on using hashes here.

Provenance

The following attestation bundles were made for purejq-0.1.1-py3-none-any.whl:

Publisher: release.yml on adam2go/purejq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page