Skip to main content

Reliability gateway for schema-stable, secret-safe, pagination-complete agent JSON.

Project description

Sift

Reliability gateway for AI tool output: schema-stable, secret-safe, pagination-complete JSON.

Python 3.11+ PyPI License: MIT

Sift is a drop-in reliability layer for MCP and CLI tool output. It persists full payloads as artifacts, returns either inline payload (full) or compact references (schema_ref), and lets agents query what they need with Python code over stored data.

Benchmark summary: on 103 factual questions across 12 real JSON datasets, Sift improved accuracy from 33.0% to 99.0% while cutting input tokens by 95.4% (10,757,230 -> 489,655). Full details: benchmarks/README.md.

How it works

                           ┌─────────────────────┐
  MCP tool call ──────────▶│                     │──────────▶ Upstream MCP server
  CLI command   ──────────▶│        Sift         │──────────▶ Shell/API command
                           │                     │
                           │   ┌─────────────┐   │
                           │   │  Artifacts  │   │
                           │   │  (SQLite)   │   │
                           │   └─────────────┘   │
                           └─────────────────────┘
                                     │
                                     ▼
                         Small output -> `full` inline
                         Large output -> `schema_ref`
                         Agent queries artifacts with code

Flow:

  1. Execute upstream tool/command and capture JSON.
  2. Persist full output as an artifact in SQLite and deterministically map schema/root hints.
  3. Return full (small) or schema_ref (large/paginated).
  4. Continue pages explicitly until pagination.retrieval_status == COMPLETE.
  5. Run focused Python queries on one artifact or the full pagination chain.

Main MCP pain points

These are recurring across MCP client issue trackers and protocol usage in production:

  • Large tool definitions and large tool results consume context quickly.
  • Upstream API pagination often sits outside MCP list-cursor flows, so agents can stop early and answer on partial data.
  • Tool output shape differs across servers, which makes follow-up parsing brittle.
  • Tool output is untrusted input and can contain sensitive values that should not re-enter model context.
  • Raw outputs scroll away in chat history, so provenance and reproducibility degrade across multi-step runs.

Background and references: docs/why.md.

What Sift adds (without changing upstream servers)

  • Artifact-backed outputs: keep full data out of prompt context while preserving it losslessly.
  • Schema-aware references: schema_ref returns query guidance for stable follow-up analysis.
  • Exact structured retrieval: run Python against stored artifacts instead of relying on prompt-sized payloads.
  • Exact structured retrieval via artifact(action="query", query_kind="code", ...) (MCP) or sift-gateway code (CLI).
  • Explicit pagination contract: continue with artifact(action="next_page") or run --continue-from.
  • Completion signaling: do not stop until pagination.retrieval_status == COMPLETE.
  • Pagination-chain analysis: query one artifact or all related pages (scope="all_related"; CLI default).
  • Outbound secret redaction enabled by default before output returns to the model.

MCP vs CLI positioning

  • MCP: Sift is a reliability gateway for mirrored tool calls and artifact-based follow-up queries.
  • CLI/OpenClaw: same artifact contract for command output (sift-gateway run + sift-gateway code).
  • CLI pitfall: ad-hoc extraction can silently scope analysis to partial data (for example, inspecting only one row).
  • CLI note: for one-off local extraction, plain jq can be enough. Sift is for repeatable, pagination-complete, policy-controlled workflows.

60-second quickstart

MCP clients

pipx install sift-gateway
sift-gateway init --from claude

Restart your MCP client, then use mirrored tools normally.

Supported --from shortcuts: claude, claude-code, cursor, vscode, windsurf, zed, auto, or an explicit config path.

CLI flow

# 1) Capture JSON output as an artifact
sift-gateway run --json -- kubectl get pods -A -o json

# 2) Query artifact data with Python
sift-gateway code --json <artifact_id> '$' --code "def run(data, schema, params): return {'rows': len(data)}"

Use $ when rows are at root. If nested, use metadata.usage.root_path from run --json (or metadata.queryable_roots in MCP schema_ref).

Pagination continuation

sift-gateway run --json --continue-from <artifact_id> -- <next-command-with-next-params-applied>

Do not claim completion until pagination.retrieval_status == COMPLETE.

Python codegen over all pages

For complex questions, generate Python once and run it over the entire pagination chain:

sift-gateway code --json --scope all_related <artifact_id> '$' --file ./analysis.py

CLI default is --scope all_related. Use --scope single for anchor-only analysis.

Benchmarks

Tier 1 result (claude-sonnet-4-6):

Condition Accuracy Input Tokens
Baseline (context-stuffed) 34/103 (33.0%) 10,757,230
Sift 102/103 (99.0%) 489,655

That is +66.0 points accuracy with 95.4% fewer input tokens on the same question set.

Methodology, scripts, and Tier 2 autonomous-agent results: benchmarks/README.md.

Documentation library

Start here: docs/README.md

Getting started

Core contracts

Operations and security

Patterns and deep dives

Security

See SECURITY.md for threat model and hardening guidance.

Development

git clone https://github.com/lourencomaciel/sift-gateway.git
cd sift-gateway
uv sync --extra dev
uv run python -m pytest tests/unit/ -q

Full contributor workflow: CONTRIBUTING.md

License

MIT - see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sift_gateway-0.4.1.tar.gz (296.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sift_gateway-0.4.1-py3-none-any.whl (378.5 kB view details)

Uploaded Python 3

File details

Details for the file sift_gateway-0.4.1.tar.gz.

File metadata

  • Download URL: sift_gateway-0.4.1.tar.gz
  • Upload date:
  • Size: 296.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sift_gateway-0.4.1.tar.gz
Algorithm Hash digest
SHA256 fa6abdc0b0fdff4126f0cc0905ff65d1f2d3f1655441cf1d5441653e649a46e7
MD5 e8fc53fbc88b8c81f44797204a443343
BLAKE2b-256 d67da27c2683dd066acb275c9def5966380a7fa1f04cec4a914dddf4a970406f

See more details on using hashes here.

Provenance

The following attestation bundles were made for sift_gateway-0.4.1.tar.gz:

Publisher: release.yml on lourencomaciel/sift-gateway

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sift_gateway-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: sift_gateway-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 378.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sift_gateway-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1491921b96d4682fee5ab8b249258a69e0caf8b52437e5b209f65fa1e8e87f13
MD5 011a2c567cc2b2ea7a31a8ca4a8de59c
BLAKE2b-256 bc318dcaf78efd0b1a9770ff4377ee2faa5efa41ec3eec11d0d2e1b2080e1785

See more details on using hashes here.

Provenance

The following attestation bundles were made for sift_gateway-0.4.1-py3-none-any.whl:

Publisher: release.yml on lourencomaciel/sift-gateway

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page