Skip to main content

Reliability gateway for schema-stable, secret-safe, pagination-complete agent JSON.

Project description

Sift

Reliability gateway for AI tool output: schema-stable, secret-safe, pagination-complete JSON.

Python 3.11+ PyPI License: MIT

Sift is a drop-in reliability layer for MCP and CLI tool output. It persists full payloads as artifacts, returns either inline payload (full) or compact references (schema_ref), and lets agents query what they need with Python code over stored data.

Benchmark summary: on 103 factual questions across 12 real JSON datasets, Sift improved accuracy from 33.0% to 99.0% while cutting input tokens by 95.4% (10,757,230 -> 489,655). Full details: benchmarks/README.md.

How it works

                           ┌─────────────────────┐
  MCP tool call ──────────▶│                     │──────────▶ Upstream MCP server
  CLI command   ──────────▶│        Sift         │──────────▶ Shell/API command
                           │                     │
                           │   ┌─────────────┐   │
                           │   │  Artifacts  │   │
                           │   │  (SQLite)   │   │
                           │   └─────────────┘   │
                           └─────────────────────┘
                                     │
                                     ▼
                         Small output -> `full` inline
                         Large output -> `schema_ref`
                         Agent queries artifacts with code

Flow:

  1. Execute upstream tool/command and capture JSON.
  2. Persist full output as an artifact in SQLite and deterministically map schema/root hints.
  3. Return full (small) or schema_ref (large/paginated).
  4. Continue pages explicitly until pagination.retrieval_status == COMPLETE.
  5. Run focused Python queries on one artifact or the full pagination chain.

Main MCP pain points

These are recurring across MCP client issue trackers and protocol usage in production:

  • Large tool definitions and large tool results consume context quickly.
  • Upstream API pagination often sits outside MCP list-cursor flows, so agents can stop early and answer on partial data.
  • Tool output shape differs across servers, which makes follow-up parsing brittle.
  • Tool output is untrusted input and can contain sensitive values that should not re-enter model context.
  • Raw outputs scroll away in chat history, so provenance and reproducibility degrade across multi-step runs.

Background and references: docs/why.md.

What Sift adds (without changing upstream servers)

  • Artifact-backed outputs: keep full data out of prompt context while preserving it losslessly.
  • Tool inspection helper: keep mirrored tools/list descriptions compact and pull full docs with gateway.inspect_tool.
  • Schema-aware references: schema_ref returns query guidance for stable follow-up analysis.
  • Exact structured retrieval: run Python against stored artifacts instead of relying on prompt-sized payloads.
  • Exact structured retrieval via artifact(action="query", query_kind="code", ...) (MCP) or sift-gateway code (CLI).
  • Explicit pagination contract: continue with artifact(action="next_page") or run --continue-from.
  • Completion signaling: do not stop until pagination.retrieval_status == COMPLETE.
  • Pagination-chain analysis: query one artifact or all related pages (scope="all_related"; CLI default).
  • Outbound secret redaction enabled by default before output returns to the model.

MCP vs CLI positioning

  • MCP: Sift is a reliability gateway for mirrored tool calls and artifact-based follow-up queries.
  • CLI/OpenClaw: same artifact contract for command output (sift-gateway run + sift-gateway code).
  • CLI pitfall: ad-hoc extraction can silently scope analysis to partial data (for example, inspecting only one row).
  • CLI note: for one-off local extraction, plain jq can be enough. Sift is for repeatable, pagination-complete, policy-controlled workflows.

60-second quickstart

MCP clients

pipx install sift-gateway
sift-gateway init --from claude

Restart your MCP client, then use mirrored tools normally.

Supported --from shortcuts: claude, claude-code, cursor, vscode, windsurf, zed, auto, or an explicit config path.

CLI flow

# 1) Capture JSON output as an artifact
sift-gateway run --json -- kubectl get pods -A -o json

# 2) Query artifact data with Python
sift-gateway code --json <artifact_id> '$' --code "def run(data, schema, params): return {'rows': len(data)}"

Use $ when rows are at root. If nested, use metadata.usage.root_path from run --json (or metadata.queryable_roots in MCP schema_ref).

Pagination continuation

sift-gateway run --json --continue-from <artifact_id> -- <next-command-with-next-params-applied>

Do not claim completion until pagination.retrieval_status == COMPLETE.

Python codegen over all pages

For complex questions, generate Python once and run it over the entire pagination chain:

sift-gateway code --json --scope all_related <artifact_id> '$' --file ./analysis.py

CLI default is --scope all_related. Use --scope single for anchor-only analysis.

Benchmarks

Tier 1 result (claude-sonnet-4-6):

Condition Accuracy Input Tokens
Baseline (context-stuffed) 34/103 (33.0%) 10,757,230
Sift 102/103 (99.0%) 489,655

That is +66.0 points accuracy with 95.4% fewer input tokens on the same question set.

Methodology, scripts, and Tier 2 autonomous-agent results: benchmarks/README.md.

Documentation library

Start here: docs/README.md

Getting started

Core contracts

Operations and security

Patterns and deep dives

Security

See SECURITY.md for threat model and hardening guidance.

Development

git clone https://github.com/lourencomaciel/sift-gateway.git
cd sift-gateway
uv sync --extra dev
uv run python -m pytest tests/unit/ -q

Full contributor workflow: CONTRIBUTING.md

License

MIT - see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sift_gateway-0.4.4.tar.gz (313.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sift_gateway-0.4.4-py3-none-any.whl (397.9 kB view details)

Uploaded Python 3

File details

Details for the file sift_gateway-0.4.4.tar.gz.

File metadata

  • Download URL: sift_gateway-0.4.4.tar.gz
  • Upload date:
  • Size: 313.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sift_gateway-0.4.4.tar.gz
Algorithm Hash digest
SHA256 c3967f449c3267543f80839ae441f92b27d54e3d0d2f724f7ecb6400422579e7
MD5 4c463f3aef8ba6d722f309fd91e148f6
BLAKE2b-256 17e4e70c604abf8e7a23c479bea8e998cca410bfbcc89217942c1430f71db7c7

See more details on using hashes here.

Provenance

The following attestation bundles were made for sift_gateway-0.4.4.tar.gz:

Publisher: release.yml on lourencomaciel/sift-gateway

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sift_gateway-0.4.4-py3-none-any.whl.

File metadata

  • Download URL: sift_gateway-0.4.4-py3-none-any.whl
  • Upload date:
  • Size: 397.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sift_gateway-0.4.4-py3-none-any.whl
Algorithm Hash digest
SHA256 a9292cdbd7f2c68bf1498a9d333c97fe324bef713c649048f458ffc4a493096d
MD5 025a7c22b4a68f0ff15bcd689e829426
BLAKE2b-256 6b9676664b96cb4829a9282b6770e5407eb29f991768ab6719258769e457034f

See more details on using hashes here.

Provenance

The following attestation bundles were made for sift_gateway-0.4.4-py3-none-any.whl:

Publisher: release.yml on lourencomaciel/sift-gateway

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page