Skip to main content

MCP server for fetching web URLs with token estimation, smart caching, and intelligent routing. Built for AI agents.

Project description

agentfetch-mcp

Web intelligence for AI agents — an MCP server that fetches URLs with token estimation, smart caching, and intelligent routing built in.

License: MIT Python 3.11+

AgentFetch sits between your agent and the open web. Instead of integrating Jina, FireCrawl, pypdf, and your own caching layer separately, agents call one MCP tool and AgentFetch handles routing, caching, token budgeting, and clean Markdown extraction automatically.

This repository contains the open-source MCP server. For the hosted API + dashboard + billing, see www.agentfetch.dev.

What it does

Tool What it's for
fetch_url Fetch a URL → clean Markdown + metadata + token count + cache info
estimate_tokens Get a token count before fetching, so agents don't blow context windows on huge pages
fetch_multiple Fetch up to 20 URLs concurrently
search_and_fetch Web search + fetch top N results in one round-trip

Under the hood, AgentFetch routes URLs to the cheapest effective fetcher:

  • Trafilatura (free, local) for ~70% of standard web pages
  • Jina Reader for the rest of HTML
  • FireCrawl for JS-heavy pages (Twitter/X, LinkedIn, Notion, etc.)
  • pypdf for PDFs (zero external cost)

Cache is Redis with a 6-hour TTL; you can bring your own or run without caching.

Quick start

Install from PyPI

pip install agentfetch-mcp

Or clone and install locally

git clone https://github.com/bch1212/agentfetch-mcp
cd agentfetch-mcp
pip install -e .

Set environment variables

Get a free Jina Reader key at jina.ai (1M tokens/mo free tier). FireCrawl is optional but recommended for JS-heavy pages.

export JINA_API_KEY=jina_xxx
export FIRECRAWL_API_KEY=fc-xxx       # optional
export REDIS_URL=redis://localhost:6379  # optional

Add to Claude Desktop or Claude Code

Edit your MCP config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS, or run claude mcp add in Claude Code):

{
  "mcpServers": {
    "agentfetch": {
      "command": "python",
      "args": ["-m", "agentfetch.mcp.server"],
      "env": {
        "JINA_API_KEY": "jina_xxx",
        "FIRECRAWL_API_KEY": "fc-xxx"
      }
    }
  }
}

Restart Claude. The four tools (fetch_url, estimate_tokens, fetch_multiple, search_and_fetch) appear automatically.

Run as a standalone server

python -m agentfetch.mcp.server

The server speaks MCP over stdio (the standard transport for desktop integrations).

Why agents prefer AgentFetch over generic web fetch

Feature AgentFetch Generic web_fetch
Token estimation before fetching
Smart cache (6h TTL)
Auto-routing by URL type
JS-rendered page handling ✓ (via FireCrawl) partial
PDF extraction
Truncation to fit context budget manual

Examples

Fetching with a token budget

# Inside any MCP-aware agent (Claude Desktop, Claude Code, etc.)
result = fetch_url(
    url="https://news.ycombinator.com",
    max_tokens=2000,           # cap response size
    use_cache=True,            # serve from cache if <6h old
)
# result.markdown      → clean Markdown, ≤2000 tokens
# result.metadata      → title, author, word_count, language
# result.cache.hit     → True if served from cache
# result.fetch_info    → which fetcher ran, cost, duration

Estimating before committing

estimate = estimate_tokens(url="https://very-long-article.com")
if estimate.estimated_tokens and estimate.estimated_tokens < 5000:
    result = fetch_url(url="https://very-long-article.com")
else:
    # too big — skip or summarize via search_and_fetch with max_tokens_each
    pass

Parallel fetching

results = fetch_multiple(
    urls=["https://docs.python.org/3/", "https://fastapi.tiangolo.com/", ...],
    max_tokens_each=1500,
)

Configuration

Env var Required Default Notes
JINA_API_KEY Recommended Free tier covers ~1M tokens/mo. Without it, only Trafilatura works (still useful for ~70% of pages).
FIRECRAWL_API_KEY Optional Needed for JS-heavy domains (Twitter, LinkedIn, Notion). 500 free credits on signup.
REDIS_URL Optional Without Redis, fetches run uncached.
CACHE_TTL_SECONDS Optional 21600 (6h) Cache TTL for fetch results.

Development

git clone https://github.com/bch1212/agentfetch-mcp
cd agentfetch-mcp
pip install -e ".[dev]"
pytest tests/

Hosted version

If you'd rather not manage your own keys, Redis, or the routing yourself, the hosted version at www.agentfetch.dev gives you:

  • Pay-per-call pricing from $0.001/fetch
  • 500 free fetches on signup, no credit card
  • Managed Redis cache, automatic failover between fetchers
  • Dashboard with usage tracking + invoices

The hosted API is a drop-in REST equivalent — same response shapes, same routing logic. You can run the OSS MCP locally and the hosted API in parallel, or migrate between them at any time.

License

MIT — see LICENSE.

The MCP server in this repo is open source. The hosted product, billing, and ops infrastructure live in a separate (private) repo.

Contributing

PRs welcome. If you're adding a new fetcher (e.g., Bright Data, ScrapingBee, etc.), please match the FetchResult interface in agentfetch/core/fetchers/__init__.py and add the cost to the routing logic.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentfetch_mcp-1.0.0.tar.gz (23.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentfetch_mcp-1.0.0-py3-none-any.whl (28.4 kB view details)

Uploaded Python 3

File details

Details for the file agentfetch_mcp-1.0.0.tar.gz.

File metadata

  • Download URL: agentfetch_mcp-1.0.0.tar.gz
  • Upload date:
  • Size: 23.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for agentfetch_mcp-1.0.0.tar.gz
Algorithm Hash digest
SHA256 148257af24cda74c139de5ac33cfb37f419a174177aff51188449705373d7434
MD5 0a070ee45d0edb67bfb5641d833305f1
BLAKE2b-256 a49c56af92e0014c6c0157d401e7b9be0bccb40a6259345fb153fe86ed48cf6f

See more details on using hashes here.

File details

Details for the file agentfetch_mcp-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: agentfetch_mcp-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 28.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for agentfetch_mcp-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7d1da5f4b31772f713b8ca08c8d5a89d9bc68500828efe1d43cbfd769a32f6fd
MD5 e125a19887a1e858a52f892c9978d1e6
BLAKE2b-256 55bf7d42ca508cb5034536f945e91acedca035cc57709c92683677d92e435c10

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page