Skip to main content

Reliability gateway for schema-stable, secret-safe, pagination-complete agent JSON.

Project description

Sift

Reliability gateway for AI tool output: schema-stable, secret-safe, pagination-complete JSON.

Python 3.11+ PyPI License: MIT

Sift is a drop-in reliability layer for MCP and CLI tool output. It persists full payloads as artifacts, returns either inline payload (full) or compact references (schema_ref), and lets agents query what they need with Python code over stored data.

Benchmark summary: on 103 factual questions across 12 real JSON datasets, Sift improved accuracy from 33.0% to 99.0% while cutting input tokens by 95.4% (10,757,230 -> 489,655). Full details: benchmarks/README.md.

How it works

                           ┌─────────────────────┐
  MCP tool call ──────────▶│                     │──────────▶ Upstream MCP server
  CLI command   ──────────▶│        Sift         │──────────▶ Shell/API command
                           │                     │
                           │   ┌─────────────┐   │
                           │   │  Artifacts  │   │
                           │   │  (SQLite)   │   │
                           │   └─────────────┘   │
                           └─────────────────────┘
                                     │
                                     ▼
                         Small output -> `full` inline
                         Large output -> `schema_ref`
                         Agent queries artifacts with code

Flow:

  1. Execute upstream tool/command and capture JSON.
  2. Persist full output as an artifact in SQLite and deterministically map schema/root hints.
  3. Return full (small) or schema_ref (large/paginated).
  4. Continue pages explicitly until pagination.retrieval_status == COMPLETE.
  5. Run focused Python queries on one artifact or the full pagination chain.

Main MCP pain points

These are recurring across MCP client issue trackers and protocol usage in production:

  • Large tool definitions and large tool results consume context quickly.
  • Upstream API pagination often sits outside MCP list-cursor flows, so agents can stop early and answer on partial data.
  • Tool output shape differs across servers, which makes follow-up parsing brittle.
  • Tool output is untrusted input and can contain sensitive values that should not re-enter model context.
  • Raw outputs scroll away in chat history, so provenance and reproducibility degrade across multi-step runs.

Background and references: docs/why.md.

What Sift adds (without changing upstream servers)

  • Artifact-backed outputs: keep full data out of prompt context while preserving it losslessly.
  • Schema-aware references: schema_ref returns query guidance for stable follow-up analysis.
  • Exact structured retrieval: run Python against stored artifacts instead of relying on prompt-sized payloads.
  • Exact structured retrieval via artifact(action="query", query_kind="code", ...) (MCP) or sift-gateway code (CLI).
  • Explicit pagination contract: continue with artifact(action="next_page") or run --continue-from.
  • Completion signaling: do not stop until pagination.retrieval_status == COMPLETE.
  • Pagination-chain analysis: query one artifact or all related pages (scope="all_related"; CLI default).
  • Outbound secret redaction enabled by default before output returns to the model.

MCP vs CLI positioning

  • MCP: Sift is a reliability gateway for mirrored tool calls and artifact-based follow-up queries.
  • CLI/OpenClaw: same artifact contract for command output (sift-gateway run + sift-gateway code).
  • CLI pitfall: ad-hoc extraction can silently scope analysis to partial data (for example, inspecting only one row).
  • CLI note: for one-off local extraction, plain jq can be enough. Sift is for repeatable, pagination-complete, policy-controlled workflows.

60-second quickstart

MCP clients

pipx install sift-gateway
sift-gateway init --from claude

Restart your MCP client, then use mirrored tools normally.

Supported --from shortcuts: claude, claude-code, cursor, vscode, windsurf, zed, auto, or an explicit config path.

CLI flow

# 1) Capture JSON output as an artifact
sift-gateway run --json -- kubectl get pods -A -o json

# 2) Query artifact data with Python
sift-gateway code --json <artifact_id> '$' --code "def run(data, schema, params): return {'rows': len(data)}"

Use $ when rows are at root. If nested, use metadata.usage.root_path from run --json (or metadata.queryable_roots in MCP schema_ref).

Pagination continuation

sift-gateway run --json --continue-from <artifact_id> -- <next-command-with-next-params-applied>

Do not claim completion until pagination.retrieval_status == COMPLETE.

Python codegen over all pages

For complex questions, generate Python once and run it over the entire pagination chain:

sift-gateway code --json --scope all_related <artifact_id> '$' --file ./analysis.py

CLI default is --scope all_related. Use --scope single for anchor-only analysis.

Benchmarks

Tier 1 result (claude-sonnet-4-6):

Condition Accuracy Input Tokens
Baseline (context-stuffed) 34/103 (33.0%) 10,757,230
Sift 102/103 (99.0%) 489,655

That is +66.0 points accuracy with 95.4% fewer input tokens on the same question set.

Methodology, scripts, and Tier 2 autonomous-agent results: benchmarks/README.md.

Documentation library

Start here: docs/README.md

Getting started

Core contracts

Operations and security

Patterns and deep dives

Security

See SECURITY.md for threat model and hardening guidance.

Development

git clone https://github.com/lourencomaciel/sift-gateway.git
cd sift-gateway
uv sync --extra dev
uv run python -m pytest tests/unit/ -q

Full contributor workflow: CONTRIBUTING.md

License

MIT - see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sift_gateway-0.4.2.tar.gz (296.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sift_gateway-0.4.2-py3-none-any.whl (378.6 kB view details)

Uploaded Python 3

File details

Details for the file sift_gateway-0.4.2.tar.gz.

File metadata

  • Download URL: sift_gateway-0.4.2.tar.gz
  • Upload date:
  • Size: 296.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sift_gateway-0.4.2.tar.gz
Algorithm Hash digest
SHA256 f4eec5ff76fdf8f806c43e8756dd9d88087d68d1b9ed47cd3fcde73a17d969d3
MD5 fde795d285088d731507360a07fbb5e2
BLAKE2b-256 d1c100f2ba81883caa6a05a3095b9e525013b33120e65045a288b13c099c6be8

See more details on using hashes here.

Provenance

The following attestation bundles were made for sift_gateway-0.4.2.tar.gz:

Publisher: release.yml on lourencomaciel/sift-gateway

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sift_gateway-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: sift_gateway-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 378.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sift_gateway-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9733093726c73e8d95b85b496cb26be33f62f050bedd605269581661de6c7190
MD5 c1ac38f0e52f88ce095d3081c643d693
BLAKE2b-256 ed3747d5b12034db7a2c72004f38dc2a0b4994e29f4fd0b8f70d043e2266f09a

See more details on using hashes here.

Provenance

The following attestation bundles were made for sift_gateway-0.4.2-py3-none-any.whl:

Publisher: release.yml on lourencomaciel/sift-gateway

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page