Data processing library built on top of Ibis and DataFusion to write multi-engine data workflows.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

dmesejo hussainsultan

These details have not been verified by PyPI

Project links

Project description

Xorq Logo

License PyPI - Version CI Status

Documentation • Website • Claude Code plugin

Xorq is an executable memory system for tabular data work. Xorq gives agents a catalog of executable pipelines instead of markdown notes. It turns ephemeral agent work such as pandas scripts, sklearn pipelines, ad-hoc tables into durable, composable, executable artifacts that any future agent or human can discover, reproduce and reuse.

It comes with a CLI for agents and a TUI for humans with a git-native catalog. xorq catalog TUI

The Problem

Coding agents are great at accomplishing closed-loop task but in the process accumulate tech-debt and unnecessary complexity. For example, if you ask a coding agent to build a dashboard, you are more likely than not to get a folder of one-off Python scripts that import each other in non-obvious ways, an embedded JSON holding intermediate state, and a requirements.txt that was last regenerated two sessions ago. It may also execute end-to-end on your laptop. Verifying by reproducing on another machine, or productionizing any of it, means rewriting some of it. And every time you rewrite, more complexity gets introduced.

Pain	Symptom
Imperative, stateful artifacts	An agent run leaves you with a folder of `.py`, `.json`, and `.html` files. Reproducing the result means re-running them in the right order without a declarative spec
No discoverable, shared index	"Team memory" today is `~/.claude/memory/*.md`, with a `MEMORY.md` index of one-liners pointing to the notes. There's no executable catalog two agents can both pull into context
No lineage graph	Rename a column upstream and a downstream model breaks at runtime. The dependency lived only in chat history, not in a graph that could have flagged it before it shipped.
No portable environment	A pipeline that ran in one agent session has no path to another sandbox, your machine, or production.

Two ways to start

With an agent. Install the Xorq plugin in Claude Code and let it build catalogs for you:

/plugin marketplace add xorq-labs/claude-plugins
/plugin install xorq@xorq-plugins

The plugin adds four slash commands:

/xorq:init — load CSV or Parquet files as catalog entries
/xorq:catalog-explore — browse what's already in a catalog
/xorq:composer — combine entries into new joined/aliased entries
/xorq:builder — assemble ML pipelines and semantic-layer entries

The agent does the building; you keep the catalog.

Manually. Install the library and start composing expressions in Python:

❯ pip install xorq[examples]
❯ xorq init -t penguins

Design choices

Choice	What it enables
Ibis as expression system	Declarative dataframe expressions that compile to many engines.
Git for state and storage	The catalog is a git repo of entries with git-annex support for large files
uv for reproducible environments	Each entry ships with a wheel and pinned `requirements.txt`.
DataFusion for embedded compute	Pipelines execute in-process SQL and UDF execution
Arrow for IPC and network	Operators exchange Arrow RecordBatches

Supported engines

The same expression can run against any of these backends, and into_backend moves data between them.

Category	Engines
Embedded	DataFusion, DuckDB, SQLite, pandas
Warehouses	Snowflake, Databricks, Trino, Postgres
Lakehouse	PyIceberg
Arrow Flight	GizmoSQL

Comparison

A Xorq memory is a computation you reason about by its invariants (schema, lineage, content hash, deterministic execution), the way you reason about a matrix by its properties rather than its entries.

Approach	Memory item	Answer produced by	Provenance & reproducibility
Agent memory (Mem0, etc)	Markdown snippets	LLM reading the prompt	None
MCP / open context servers	Tool bindings	Tool at runtime; LLM consumes as text	Per-tool
dbt	SQL model files	Warehouse executing compiled SQL	`manifest.json` captures lineage; env (warehouse, packages) pinned externally
Xorq	Content-addressed expression + pinned env	Engine executing the expression	`expr.yaml` + uv-pinned env shipped with the artifact

Benchmark

On DABStep — 450 data-analysis questions over payment transaction data — a Xorq semantic catalog of 33 named expressions takes Haiku from 50% to 84%, 8pp above the Sonnet baseline.

DABStep accuracy: Haiku 4.5 50%, Sonnet 4.6 75%, Haiku 4.5 + Semantic Catalog 84%

Where the agent looks for context mattered more than which base model it used. Full write-up: Orientation Over Reasoning.

Under the hood

The Expression — declarative Ibis, multi-engine, Arrow-native

Write declarative Ibis expressions that run like a tool. Xorq extends Ibis with caching, multi-engine execution, and UDFs. Below, xo._ is the Ibis row reference — xo._.species refers to the species column of the current table.

import xorq.api as xo
from xorq.caching import ParquetCache

penguins = xo.examples.penguins.fetch()

penguins_agg = (
    penguins
    .filter(xo._.species.notnull())
    .group_by("species")
    .agg(avg_bill_length=xo._.bill_length_mm.mean())
)

expr = (
    penguins_agg
    .cache(ParquetCache.from_kwargs())
)

One expression, many engines

expr = penguins.into_backend(xo.sqlite.connect())
expr.ls.backends

(<xorq.backends.sqlite.Backend at 0x107debda0>,
 <xorq.backends.xorq_datafusion.Backend at 0x1669002c0>)

Expressions are tools, Arrow is the pipe

Unix pipes text streams between small programs. Xorq pipes Arrow streams between expressions.

unix : programs :: xorq : arrow-transforms

In [6]: expr.to_pyarrow_batches()
Out[6]: <pyarrow.lib.RecordBatchReader at 0x15dc3f570>

Workflows, without state

Xorq executes expressions as Arrow RecordBatch streams — no DAG of tasks to checkpoint, just data flowing through transforms.

Scikit-learn pipelines

Xorq translates scikit-learn Pipeline objects to deferred expressions via Pipeline.from_instance(sklearn_pipeline). End-to-end sklearn examples live in xorq-labs/xorq-gallery.

The Catalog — a git repo of build artifacts on the filesystem

The catalog is a git repo of build artifacts on filesystem. xorq catalog add packages a build directory -- manifest (expr.yaml + *_metadata.json), Python environment via uv -- into an entry.

Build and add

❯ xorq uv build expr.py
Building wheel...
Successfully built ...
builds/fa2122f6a9e9

❯ xorq catalog -p git-catalogs/penguins init
Initialized catalog at /git-catalogs/penguins

❯ xorq catalog add builds/fa2122f6a9e9/ -a penguins-agg
Added fa2122f6a9e9

Git history

Every catalog operation is a commit you can read:

❯ git -C git-catalogs/penguins reflog
17dd4e9 (HEAD -> main) HEAD@{0}: add: fa2122f6a9e9 (aliases penguins-agg)
9f5d242 HEAD@{1}: add catalog.yaml
9915df3 HEAD@{2}: commit: Switching to main

Catalog layout

❯ tree git-catalogs/penguins
git-catalogs/penguins
├── aliases
│   └── penguins-agg.zip -> ../entries/fa2122f6a9e9.zip
├── entries
│   └── fa2122f6a9e9.zip
├── metadata
│   └── fa2122f6a9e9.zip.metadata.yaml
└── catalog.yaml

Aliases are symlinks, entries are zipped builds, and metadata sidecars are plain YAML. An agent that clones the repo can discover everything with file operations — no service to call, no API to learn:

# List aliased entries
❯ ls git-catalogs/penguins/aliases/

# Find entries that emit an 'avg_bill_length' column
❯ grep -l 'avg_bill_length' git-catalogs/penguins/metadata/*.yaml

# Find entries running on DataFusion
❯ grep -l 'xorq_datafusion' git-catalogs/penguins/metadata/*.yaml

# Find source entries (vs. unbound, expr_builder kinds)
❯ grep -l 'kind: source' git-catalogs/penguins/metadata/*.yaml

Inside an entry

A build directory contains the manifest plus everything needed to reproduce it. The zipped build is the entry stored in the catalog.

❯ tree builds/fa2122f6a9e9
├── build_metadata.json
├── expr.yaml
├── expr_metadata.json
├── profiles.yaml
├── requirements.txt
└── xorq-0.3.24-py3-none-any.whl

The manifest (expr.yaml + *_metadata.json) is the content-addressed specification of the pipeline. The entry packages it with deps and source for reproducible execution.

# Input-addressed, composable, portable
# Abridged expr.yaml
definitions:
  nodes:
    '@read_b5f228c91f16':
      op: Read
      method_name: read_parquet
      name: penguins
      read_kwargs:
        - [hash_path, .../penguins/20250703T145709Z-c3cde/penguins.parquet]
        - [table_name, penguins]
      schema_ref: schema_f11dda6745cc

    '@filter_fa4a3fde7765':
      op: Filter
      parent: { node_ref: '@read_b5f228c91f16' }
      predicates:
        - { op: NotNull, arg: { op: Field, name: species, ... } }

    '@aggregate_eb3109707390':
      op: Aggregate
      parent: { node_ref: '@filter_fa4a3fde7765' }
      by:
        species: { op: Field, name: species, ... }
      metrics:
        avg_bill_length:
          op: Mean
          arg: { op: Field, name: bill_length_mm, ... }

    '@cachednode_fa2122f6a9e9':
      op: CachedNode
      parent: { node_ref: '@aggregate_eb3109707390' }
      cache:
        type: ParquetCache
        relative_path: parquet
      schema_ref: schema_9271d5e9d443

expression:
  node_ref: '@cachednode_fa2122f6a9e9'
  schema_ref: { schema_ref: schema_9271d5e9d443 }

The Tools — catalog, run, serve

The entry is the unit of executable memory that includes the manifest plus environment to run it. The tools — catalog, run, serve — are how agents and humans compose with it.

Catalog

Once an entry is published, agents discover it straight from the catalog filesystem — metadata/*.yaml sidecars sit next to the zipped entries, so listing, filtering, and lookup-by-alias/hash all work with plain file reads and git (no service required). Humans open the TUI to preview data, schema, lineage, and git history side-by-side.

❯ xorq catalog list-aliases
penguins-agg

❯ xorq catalog list
fa2122f6a9e9

Run

❯ xorq run builds/fa2122f6a9e9 -o out.parquet

Additionally, you can serve an unbound expression over Arrow Flight. with xorq serve-* commands.

Learn more

Quickstart
Why xorq?
Claude Code plugin
Scikit-learn
A Git-Native Semantic Layer — building a portable semantic catalog with Xorq
Orientation Over Reasoning — Haiku + Xorq catalog hits 84% on DABStep, above the Sonnet baseline

Pre-1.0. Expect breaking changes with migration guides.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

dmesejo hussainsultan

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.32

Jun 30, 2026

0.3.31

Jun 24, 2026

0.3.30

Jun 16, 2026

0.3.29

Jun 9, 2026

0.3.28

Jun 2, 2026

0.3.27

May 27, 2026

0.3.26

May 22, 2026

0.3.25

May 19, 2026

0.3.24

May 12, 2026

0.3.23

May 7, 2026

0.3.22

May 5, 2026

0.3.21

Apr 28, 2026

0.3.20

Apr 21, 2026

0.3.19

Apr 14, 2026

0.3.18

Apr 7, 2026

0.3.17

Apr 1, 2026

0.3.16

Mar 24, 2026

0.3.15

Mar 17, 2026

0.3.14

Mar 9, 2026

0.3.13

Mar 6, 2026

0.3.12

Mar 4, 2026

0.3.11

Feb 24, 2026

0.3.10

Feb 18, 2026

0.3.9

Feb 10, 2026

0.3.8

Feb 6, 2026

0.3.7

Jan 8, 2026

0.3.6

Jan 5, 2026

0.3.5

Dec 22, 2025

0.3.4

Nov 19, 2025

0.3.3

Nov 11, 2025

0.3.2

Nov 3, 2025

0.3.1

Aug 30, 2025

0.3.0

Jul 28, 2025

0.2.5

Jul 26, 2025

0.2.4

Jul 9, 2025

0.2.3

Jul 2, 2025

0.2.2

May 8, 2025

0.2.1

Apr 25, 2025

0.2.0

Mar 27, 2025

0.1.17

Mar 24, 2025

0.1.16

Mar 13, 2025

0.1.15

Mar 11, 2025

0.1.14

Feb 20, 2025

0.0.0

Jan 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xorq-0.3.32.tar.gz (1.9 MB view details)

Uploaded Jun 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

xorq-0.3.32-py3-none-any.whl (1.8 MB view details)

Uploaded Jun 30, 2026 Python 3

File details

Details for the file xorq-0.3.32.tar.gz.

File metadata

Download URL: xorq-0.3.32.tar.gz
Upload date: Jun 30, 2026
Size: 1.9 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.25 {"installer":{"name":"uv","version":"0.11.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for xorq-0.3.32.tar.gz
Algorithm	Hash digest
SHA256	`79a440d8d92f8e858bb41f9f7098f03f04cd97adf88b31dc1266b422156d56b4`
MD5	`5648db5d97b5bae1671725e3498af090`
BLAKE2b-256	`6245052de7afefd5f57d2f4b436da226813a6770480319d4f3d603288c09e333`

See more details on using hashes here.

File details

Details for the file xorq-0.3.32-py3-none-any.whl.

File metadata

Download URL: xorq-0.3.32-py3-none-any.whl
Upload date: Jun 30, 2026
Size: 1.8 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.25 {"installer":{"name":"uv","version":"0.11.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for xorq-0.3.32-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7a3b06b54b610deb226e1e191eaf6404bffee2e7250837df951f4ac91bd404ef`
MD5	`e47e0c8d02ab023434aab925201b645b`
BLAKE2b-256	`7036862f7808a5199d3d1de8cd0e74e4ad2cc02d06448051b4b4d7f0662ef2dd`

See more details on using hashes here.

xorq 0.3.32

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

The Problem

Two ways to start

Design choices

Supported engines

Comparison

Benchmark

Under the hood

One expression, many engines

Expressions are tools, Arrow is the pipe

Workflows, without state

Scikit-learn pipelines

Build and add

Git history

Catalog layout

Inside an entry

Catalog

Run

Learn more

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes