Skip to main content

LlamaIndex reader for Skim — clean web reader for AI agents. Pays $0.002/call in USDC over x402. No signup, no API keys.

Project description

llama-index-readers-skim

Give your LlamaIndex pipeline the ability to read any URL — clean Markdown, no ads, no nav, no boilerplate. Pays itself per call. No signup, no API key.

PyPI version License: MIT

llama-index-readers-skim is the official LlamaIndex reader for Skim — the canonical x402 clean reader API. It exposes one reader, SkimReader, that turns any web page into a LlamaIndex Document of agent-ready Markdown plus structured metadata (title, byline, published date, language, excerpt). Each call costs $0.002 in USDC on Base, paid automatically by your local wallet over HTTP 402.


Install

pip install llama-index-readers-skim

This pulls in the x402 client with EVM support, so there's nothing else to install.


Quickstart (60 seconds)

1. Fund a Base wallet with $1 of USDC

A dollar funds roughly 500 reads. Full step-by-step (with screenshots, for non-crypto-native devs): https://skim402.com/wallet.

Use a fresh wallet, not your personal one. This wallet's private key signs payment authorizations on your machine — treat it like a hot wallet for paying $0.002 tolls, not a savings account.

2. Point the reader at your wallet

export SKIM_WALLET_PRIVATE_KEY=0xYOUR_BASE_WALLET_PRIVATE_KEY

3. Use it

from llama_index.readers.skim import SkimReader

reader = SkimReader()  # reads SKIM_WALLET_PRIVATE_KEY from the environment

documents = reader.load_data(urls=["https://en.wikipedia.org/wiki/HTTP_402"])
print(documents[0].text)
print(documents[0].metadata)

The reader signs an EIP-3009 USDC authorization for $0.002, Skim returns clean Markdown, and you get back a Document with the article body as text and the page metadata in metadata. The payment shows up in your wallet's transaction history on BaseScan.


Build an index from web pages

SkimReader returns standard LlamaIndex Document objects, so it drops straight into any ingestion pipeline:

from llama_index.core import VectorStoreIndex
from llama_index.readers.skim import SkimReader

reader = SkimReader()
documents = reader.load_data(
    urls=[
        "https://example.com/article-one",
        "https://example.com/article-two",
    ]
)

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
print(query_engine.query("What do these articles have in common?"))

Each URL costs one $0.002 read, paid automatically as the documents load.


Output shape

load_data returns a list of Document objects. Each Document has:

  • text — the cleaned article body in Markdown.
  • metadata — a dict with the source URL plus the page metadata Skim extracted:
{
    "source": "https://example.com/article",
    "title": "Example article",
    "byline": "Jane Doe",
    "publishedAt": "2025-01-15",
    "lang": "en",
    "excerpt": "A short summary...",
}

Empty and None metadata values are dropped. Set include_metadata=False to keep only the source URL.


Configuration

SkimReader takes the following parameters (all optional except the wallet key):

Parameter Default Notes
private_key $SKIM_WALLET_PRIVATE_KEY Hex private key for the Base wallet that pays for reads. With or without 0x. Use a dedicated wallet — never your personal one.
base_url https://skim402.com Override the API base URL. For self-hosting or local development.
max_price_usd 0.01 Hard cap on per-call price in USD. The wallet refuses to sign for anything above this. Skim is $0.002/call.
include_metadata True Populate each Document's metadata with the page metadata Skim returns.
timeout 60 Per-request timeout in seconds.
reader = SkimReader(
    private_key="0x...",       # or rely on the env var
    max_price_usd=0.005,
    include_metadata=False,
)

How it actually works

your pipeline ──► SkimReader ──► POST https://skim402.com/api/v1/read
                     ▲                       │
                     │                       ▼
                     │              402 Payment Required
                     │                  (x402 challenge)
                     │                       │
                     ▼                       │
      x402 signs EIP-3009 USDC ◄─────────────┘
      transfer authorization (locally)
                     │
                     ▼
           retry POST with X-PAYMENT header
                     │
                     ▼
      Skim verifies + settles via Coinbase CDP facilitator
                     │
                     ▼
           200 OK + clean Markdown

Your private key never leaves your machine — it only signs authorizations locally.


Security

  • Dedicated wallet, always. Fund it with only as much USDC as you're willing to spend in a runaway loop. The max_price_usd cap catches accidental price escalations.
  • No outbound telemetry from this package. llama-index-readers-skim only talks to skim402.com (or whatever you set as base_url). No analytics, no error reporting, no phone-home.

Try it without a pipeline

Skeptical? Test the upstream endpoint directly — it'll return a 402 challenge so you can see the protocol in action:

curl -i -X POST https://skim402.com/api/v1/read \
  -H 'content-type: application/json' \
  -d '{"url":"https://en.wikipedia.org/wiki/HTTP_402"}'

You'll get back HTTP/1.1 402 Payment Required with the x402 challenge in the response body.


Links


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_skim-0.1.0.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llama_index_readers_skim-0.1.0-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file llama_index_readers_skim-0.1.0.tar.gz.

File metadata

File hashes

Hashes for llama_index_readers_skim-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4d47effc60af415d80563d13bb33df7e98acb56377ae3769daf8df642800b3d0
MD5 d2b3c4bee55df3ea3a38b4891997094c
BLAKE2b-256 48121493a35ee257c85a3bbe29f37daea385e50162fc1ce1324de2c8bb1dd9cd

See more details on using hashes here.

File details

Details for the file llama_index_readers_skim-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_readers_skim-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 44c316de899ddff71fb3d3c00c2943611ee16d19b225c9e9d6b3581cfa71b4ee
MD5 0d6e9c43d46d143a394a85523031bfbd
BLAKE2b-256 ccc17b64673ed008418c84c089a325221d0c8a7347519f8c0d6e6fb62b2e7fec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page