Skip to main content

Haystack integration for Skim — clean web reader for AI agents. Pays $0.002/call in USDC over x402. No signup, no API keys.

Project description

skim-haystack

Give your Haystack pipelines the ability to read any URL — clean Markdown, no ads, no nav, no boilerplate. Pays itself per call. No signup, no API key.

PyPI version License: MIT

skim-haystack is the official Haystack integration for Skim — the canonical x402 clean reader API. It provides one component, SkimReader, that fetches any web page and returns it as a Haystack Document (clean Markdown in content, structured metadata in meta). Each call costs $0.002 in USDC on Base, paid automatically by your local wallet over HTTP 402.


Install

pip install skim-haystack

This pulls in the x402 client with EVM support, so there's nothing else to install.


Quickstart (60 seconds)

1. Fund a Base wallet with $1 of USDC

A dollar funds roughly 500 reads. Full step-by-step (with screenshots, for non-crypto-native devs): https://skim402.com/wallet.

Use a fresh wallet, not your personal one. This wallet's private key signs payment authorizations on your machine — treat it like a hot wallet for paying $0.002 tolls, not a savings account.

2. Point the component at your wallet

export SKIM_WALLET_PRIVATE_KEY=0xYOUR_BASE_WALLET_PRIVATE_KEY

3. Use it

from skim_haystack import SkimReader

reader = SkimReader()  # reads SKIM_WALLET_PRIVATE_KEY from the environment

result = reader.run(urls="https://en.wikipedia.org/wiki/HTTP_402")
print(result["documents"][0].content)

The component signs an EIP-3009 USDC authorization for $0.002, Skim returns clean Markdown, and you get back a Document with the article body in content and metadata in meta. The payment shows up in your wallet's transaction history on BaseScan.


Use it in a pipeline

SkimReader is a standard Haystack component, so it drops straight into a Pipeline. Here it fetches a page and feeds the cleaned Markdown into a prompt:

from haystack import Pipeline
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from skim_haystack import SkimReader

pipe = Pipeline()
pipe.add_component("reader", SkimReader())
pipe.add_component("prompt", PromptBuilder(
    template="Summarize this article in 5 bullets:\n\n{{ documents[0].content }}"
))
pipe.add_component("llm", OpenAIGenerator(model="gpt-4o-mini"))

pipe.connect("reader.documents", "prompt.documents")
pipe.connect("prompt.prompt", "llm.prompt")

result = pipe.run({"reader": {"urls": "https://en.wikipedia.org/wiki/HTTP_402"}})
print(result["llm"]["replies"][0])

The wallet pays per read, and your pipeline gets clean Markdown instead of raw HTML.


Output shape

SkimReader.run(...) returns {"documents": [Document, ...]} — one Document per URL:

  • Document.content — the cleaned article body in Markdown.
  • Document.meta — always includes source (the URL), plus page metadata (title, byline, publishedAt, lang, excerpt, ...) unless include_metadata=False.

Configuration

SkimReader takes the following parameters (all optional except the wallet key):

Parameter Default Notes
private_key Secret.from_env_var("SKIM_WALLET_PRIVATE_KEY") A Haystack Secret with the Base wallet's hex private key. With or without 0x. Pass Secret.from_token("0x...") for an explicit key. Use a dedicated wallet — never your personal one.
base_url https://skim402.com Override the API base URL. For self-hosting or local development.
max_price_usd 0.01 Hard cap on per-call price in USD. The wallet refuses to sign for anything above this. Skim is $0.002/call.
include_metadata True Populate each Document's meta with page metadata.
timeout 60 Per-request timeout in seconds.
from haystack.utils import Secret
from skim_haystack import SkimReader

reader = SkimReader(
    private_key=Secret.from_token("0x..."),  # or rely on the env var
    max_price_usd=0.005,
    include_metadata=False,
)

The component supports pipeline serialization (to_dict/from_dict). When the key comes from an environment variable (the default, or any Secret.from_env_var(...)), it is stored as a reference to that variable name — never the raw value. An inline Secret.from_token("0x...") is intentionally runtime-only: Haystack refuses to serialize token-backed secrets, so it will never be written to disk.


How it actually works

your pipeline ──► SkimReader ──► POST https://skim402.com/api/v1/read
                     ▲                       │
                     │                       ▼
                     │              402 Payment Required
                     │                  (x402 challenge)
                     │                       │
                     ▼                       │
      x402 signs EIP-3009 USDC ◄─────────────┘
      transfer authorization (locally)
                     │
                     ▼
           retry POST with X-PAYMENT header
                     │
                     ▼
      Skim verifies + settles via Coinbase CDP facilitator
                     │
                     ▼
           200 OK + clean Markdown

Your private key never leaves your machine — it only signs authorizations locally.


Security

  • Dedicated wallet, always. Fund it with only as much USDC as you're willing to spend in a runaway loop. The max_price_usd cap catches accidental price escalations.
  • No outbound telemetry from this package. skim-haystack only talks to skim402.com (or whatever you set as base_url). No analytics, no error reporting, no phone-home.

Try it without a pipeline

Skeptical? Test the upstream endpoint directly — it'll return a 402 challenge so you can see the protocol in action:

curl -i -X POST https://skim402.com/api/v1/read \
  -H 'content-type: application/json' \
  -d '{"url":"https://en.wikipedia.org/wiki/HTTP_402"}'

You'll get back HTTP/1.1 402 Payment Required with the x402 challenge in the response body.


Links


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skim_haystack-0.1.0.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

skim_haystack-0.1.0-py3-none-any.whl (8.5 kB view details)

Uploaded Python 3

File details

Details for the file skim_haystack-0.1.0.tar.gz.

File metadata

  • Download URL: skim_haystack-0.1.0.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for skim_haystack-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b47c812bab0eb3f9257bc58727b49624050b2a9972f73175d2d86fe71a39a52a
MD5 8d808b02ab859e2226bc7ccd5a3fa607
BLAKE2b-256 3d769d4721ba0572f2f7a3754410aaadf1e3a78885be4405dc6082e7e0a6750e

See more details on using hashes here.

File details

Details for the file skim_haystack-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for skim_haystack-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dc417a4a3f339e4606946e3037d6aceb00dd762c53d246dc2cc729f324a70e20
MD5 2569338aa9ed2c57911c1a1294dd6e37
BLAKE2b-256 67ea767a4f875ad9ac5228706f545086af1d8e88b9e6fa8f3e4c18d2f08bd70b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page