Skip to main content

Reliability gateway for schema-stable, secret-safe, pagination-complete agent JSON.

Project description

Sift

Reliability gateway for AI tool output: schema-stable, secret-safe, pagination-complete JSON.

Python 3.11+ PyPI License: MIT

Sift is a drop-in reliability layer for MCP and CLI tool output. It persists full payloads as artifacts, returns either inline payload (full) or compact references (schema_ref), and lets agents query what they need with Python code over stored data.

Benchmark summary: on 103 factual questions across 12 real JSON datasets, Sift improved accuracy from 33.0% to 99.0% while cutting input tokens by 95.4% (10,757,230 -> 489,655). Full details: benchmarks/README.md.

How it works

                           ┌─────────────────────┐
  MCP tool call ──────────▶│                     │──────────▶ Upstream MCP server
  CLI command   ──────────▶│        Sift         │──────────▶ Shell/API command
                           │                     │
                           │   ┌─────────────┐   │
                           │   │  Artifacts  │   │
                           │   │  (SQLite)   │   │
                           │   └─────────────┘   │
                           └─────────────────────┘
                                     │
                                     ▼
                         Small output -> `full` inline
                         Large output -> `schema_ref`
                         Agent queries artifacts with code

Flow:

  1. Execute upstream tool/command and capture JSON.
  2. Persist full output as an artifact in SQLite and deterministically map schema/root hints.
  3. Return full (small) or schema_ref (large/paginated).
  4. Continue pages explicitly until pagination.retrieval_status == COMPLETE.
  5. Run focused Python queries on one artifact or the full pagination chain.

Main MCP pain points

These are recurring across MCP client issue trackers and protocol usage in production:

  • Large tool definitions and large tool results consume context quickly.
  • Upstream API pagination often sits outside MCP list-cursor flows, so agents can stop early and answer on partial data.
  • Tool output shape differs across servers, which makes follow-up parsing brittle.
  • Tool output is untrusted input and can contain sensitive values that should not re-enter model context.
  • Raw outputs scroll away in chat history, so provenance and reproducibility degrade across multi-step runs.

Background and references: docs/why.md.

What Sift adds (without changing upstream servers)

  • Artifact-backed outputs: keep full data out of prompt context while preserving it losslessly.
  • Schema-aware references: schema_ref returns query guidance for stable follow-up analysis.
  • Exact structured retrieval: run Python against stored artifacts instead of relying on prompt-sized payloads.
  • Exact structured retrieval via artifact(action="query", query_kind="code", ...) (MCP) or sift-gateway code (CLI).
  • Explicit pagination contract: continue with artifact(action="next_page") or run --continue-from.
  • Completion signaling: do not stop until pagination.retrieval_status == COMPLETE.
  • Pagination-chain analysis: query one artifact or all related pages (scope="all_related"; CLI default).
  • Outbound secret redaction enabled by default before output returns to the model.

MCP vs CLI positioning

  • MCP: Sift is a reliability gateway for mirrored tool calls and artifact-based follow-up queries.
  • CLI/OpenClaw: same artifact contract for command output (sift-gateway run + sift-gateway code).
  • CLI pitfall: ad-hoc extraction can silently scope analysis to partial data (for example, inspecting only one row).
  • CLI note: for one-off local extraction, plain jq can be enough. Sift is for repeatable, pagination-complete, policy-controlled workflows.

60-second quickstart

MCP clients

pipx install sift-gateway
sift-gateway init --from claude

Restart your MCP client, then use mirrored tools normally.

Supported --from shortcuts: claude, claude-code, cursor, vscode, windsurf, zed, auto, or an explicit config path.

CLI flow

# 1) Capture JSON output as an artifact
sift-gateway run --json -- kubectl get pods -A -o json

# 2) Query artifact data with Python
sift-gateway code --json <artifact_id> '$' --code "def run(data, schema, params): return {'rows': len(data)}"

Use $ when rows are at root. If nested, use metadata.usage.root_path from run --json (or metadata.queryable_roots in MCP schema_ref).

Pagination continuation

sift-gateway run --json --continue-from <artifact_id> -- <next-command-with-next-params-applied>

Do not claim completion until pagination.retrieval_status == COMPLETE.

Python codegen over all pages

For complex questions, generate Python once and run it over the entire pagination chain:

sift-gateway code --json --scope all_related <artifact_id> '$' --file ./analysis.py

CLI default is --scope all_related. Use --scope single for anchor-only analysis.

Benchmarks

Tier 1 result (claude-sonnet-4-6):

Condition Accuracy Input Tokens
Baseline (context-stuffed) 34/103 (33.0%) 10,757,230
Sift 102/103 (99.0%) 489,655

That is +66.0 points accuracy with 95.4% fewer input tokens on the same question set.

Methodology, scripts, and Tier 2 autonomous-agent results: benchmarks/README.md.

Documentation library

Start here: docs/README.md

Getting started

Core contracts

Operations and security

Patterns and deep dives

Security

See SECURITY.md for threat model and hardening guidance.

Development

git clone https://github.com/lourencomaciel/sift-gateway.git
cd sift-gateway
uv sync --extra dev
uv run python -m pytest tests/unit/ -q

Full contributor workflow: CONTRIBUTING.md

License

MIT - see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sift_gateway-0.4.3.tar.gz (310.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sift_gateway-0.4.3-py3-none-any.whl (393.5 kB view details)

Uploaded Python 3

File details

Details for the file sift_gateway-0.4.3.tar.gz.

File metadata

  • Download URL: sift_gateway-0.4.3.tar.gz
  • Upload date:
  • Size: 310.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sift_gateway-0.4.3.tar.gz
Algorithm Hash digest
SHA256 41f57905a23a9f994f9b8d72b13475c025893263217b7680c78fca84c1d2621d
MD5 d0d4f64b65d8bbdf8071cfdeabe1f972
BLAKE2b-256 6105c00b86c1167789f34dbb0302d2eb7efc937612629aadfb845260890d0363

See more details on using hashes here.

Provenance

The following attestation bundles were made for sift_gateway-0.4.3.tar.gz:

Publisher: release.yml on lourencomaciel/sift-gateway

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sift_gateway-0.4.3-py3-none-any.whl.

File metadata

  • Download URL: sift_gateway-0.4.3-py3-none-any.whl
  • Upload date:
  • Size: 393.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sift_gateway-0.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9cb8639c045b41178b4c96fd98f850582be5f23876a0d64fe57e0749290d37ac
MD5 bab9e94d9d0c0535b5444ddcc8e22020
BLAKE2b-256 bfd4060f548a9fcc18962a965ee8897185ac5257897e7cb06f0742a86c8a9381

See more details on using hashes here.

Provenance

The following attestation bundles were made for sift_gateway-0.4.3-py3-none-any.whl:

Publisher: release.yml on lourencomaciel/sift-gateway

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page