Skip to main content

Shared async HTTP scaffolding, response envelopes, corpus storage, and MCP server plumbing for data-fetching research toolkits.

Project description

mcp-data-core

Batteries-included async HTTP scaffolding and MCP server plumbing for Python data-fetching libraries.

If you're writing a Python library that pulls structured data from an API — patents, court filings, FDA records, financial filings, anything — you end up rebuilding the same eight things: an httpx client with retry, an HTTP cache, a tenacity policy, an OAuth helper, response envelopes, a typed exception hierarchy, per-app logging, and (if you ship an MCP server) tool registration, auth, and signed downloads. mcp-data-core is those eight things, packaged.

Quick Start

uv add mcp-data-core           # core scaffolding
uv add "mcp-data-core[mcp]"    # + FastMCP server helpers
from mcp_data_core import BaseAsyncClient

class MyApiClient(BaseAsyncClient):
    DEFAULT_BASE_URL = "https://api.example.com"
    CACHE_NAME = "my_api"

    async def get_thing(self, id: str) -> dict:
        return await self._request_json("GET", f"/things/{id}")

async with MyApiClient() as client:
    result = await client.get_thing("42")
    stats = await client.cache_stats()
    print(f"Cache hit rate: {stats.hit_rate:.1f}%")

That's the full surface. Retry, caching, error mapping, and connection pooling are already wired up.

Features

Feature What you get
BaseAsyncClient httpx.AsyncClient subclass with retry, caching, error mapping, and cache-management methods. Override DEFAULT_BASE_URL + CACHE_NAME; the rest is inherited.
HTTP caching hishel-backed cache with a custom SQLite/WAL storage layer. Respects HTTP cache headers by default, with TTL override. Inspection (cache_stats), eviction (cache_clear_expired), and pattern-based invalidation (cache_invalidate) built in.
Retry policy tenacity-based exponential-jitter retry (4 attempts default). Retryable status set covers 408, 429, 500-504. Honors Retry-After headers.
OAuth2 client credentials OAuth2ClientCredentialsAuth — drop-in httpx.Auth that handles token refresh, retries on 401, and works behind the cache layer.
Response envelopes ResponseEnvelope, ListEnvelope, Provenance. Cursor-based pagination helpers (encode_cursor / decode_cursor). Every response carries source provenance so downstream consumers can cite.
Typed exceptions McpDataCoreError base + ApiError, RateLimitError, NotFoundError, AuthenticationError, ServerError, ConfigurationError, ValidationError, ParseError. Log-first error formatting: str(err) appends the log path so agents can inspect without keeping stacktraces in context.
Per-app file logging logging.configure("my_app") attaches a file handler under the my_app logger tree, writing to ~/.cache/my_app/my_app.log. Idempotent; each consumer library logs to its own file.
Bundled corpora corpus_db (SQLite/FTS5 reader) and corpus_compression (zstd) for libraries that ship statutes, manuals, or other reference text alongside their API client.
MCP server scaffolding (opt-in) FastMCP server factory, bearer-token auth, domain gating middleware, conditional tool registration, signed HMAC download URLs with on-disk cache, and OAuth 2.1 + PKCE + DCR helpers.

Real-world usage

A trimmed-down version of how patent-client-agents wires up a USPTO connector:

import os
from mcp_data_core import (
    BaseAsyncClient,
    ConfigurationError,
    ListEnvelope,
    make_provenance,
)

BASE_URL = "https://api.uspto.gov"


class UsptoOdpClient(BaseAsyncClient):
    DEFAULT_BASE_URL = BASE_URL
    CACHE_NAME = "uspto_odp"

    def __init__(self, *, api_key: str | None = None, **kwargs) -> None:
        api_key = api_key or os.environ.get("USPTO_ODP_API_KEY")
        if not api_key:
            raise ConfigurationError("USPTO_ODP_API_KEY required")
        super().__init__(headers={"X-API-KEY": api_key}, **kwargs)

    async def search_applications(
        self, query: str, *, limit: int = 25
    ) -> ListEnvelope[dict]:
        payload = await self._request_json(
            "POST",
            "/api/v1/patent/applications/search",
            json={"q": query, "pagination": {"limit": limit}},
        )
        return ListEnvelope(
            summary=f"{payload['count']} applications matching {query!r}",
            items=payload["patentFileWrapperDataBag"],
            provenance=make_provenance(
                source_url=f"{BASE_URL}/api/v1/patent/applications/search",
                source_name="USPTO Open Data Portal",
            ),
        )

No retry loop. No cache invalidation. No exception remapping. No connection lifecycle. The library author writes the API-shaped methods; mcp-data-core handles everything else.

What's inside

mcp_data_core/
├── base_client.py        # BaseAsyncClient
├── cache.py              # CacheManager, build_cached_http_client, SQLite/WAL storage
├── resilience.py         # default_retryer, with_retry, RETRYABLE_STATUS_CODES
├── oauth2.py             # OAuth2ClientCredentialsAuth
├── envelope.py           # ResponseEnvelope, ListEnvelope, Provenance, cursor helpers
├── exceptions.py         # McpDataCoreError + 8 subclasses
├── logging.py            # configure() — per-app file logging
├── filenames.py          # Download filename conventions
├── corpus_db.py          # SQLite/FTS5 corpus reader
├── corpus_compression.py # zstd helpers
└── mcp/                  # Optional — installed via [mcp] extra
    ├── server_factory.py # FastMCP app factory
    ├── auth.py           # OAuth 2.1 + bearer-token helpers
    ├── middleware.py     # Domain gate, friendly errors, logging
    ├── conditional.py    # Conditional tool registration
    ├── downloads.py      # Signed HMAC download URLs + on-disk cache
    └── annotations.py    # Tool annotations (READ_ONLY, DESTRUCTIVE)

Provenance

Extracted from patent-client-agents 0.20.0 (May 2026) where it had matured as the shared infrastructure across multiple law and patent connectors. Split out as a standalone package so non-IP toolkits — regulatory (FDA), financial, scientific — can use the same scaffolding without pulling the IP-specific connector surface.

Used by

Compatibility

  • Python 3.11, 3.12, 3.13
  • macOS, Linux. Windows untested.
  • httpx 0.27+, pydantic 2.7+, tenacity 8.4+

Development

git clone https://github.com/parkerhancock/mcp-data-core
cd mcp-data-core
uv sync --all-extras --dev
uv run pytest                                 # 166 tests
uv run ruff check src tests
uv run ruff format src tests

Tests are pure-Python — no network, no fixtures, no live APIs. They exercise the cache, retry policy, OAuth refresh, MCP middleware, signed download URLs, and corpus reader against an in-memory transport.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_data_core-0.1.0.tar.gz (66.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_data_core-0.1.0-py3-none-any.whl (56.7 kB view details)

Uploaded Python 3

File details

Details for the file mcp_data_core-0.1.0.tar.gz.

File metadata

  • Download URL: mcp_data_core-0.1.0.tar.gz
  • Upload date:
  • Size: 66.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mcp_data_core-0.1.0.tar.gz
Algorithm Hash digest
SHA256 eca664379b6b403a9659573ada40fb005142f7f48ad63957323f39ce735e882a
MD5 4725b0a58249e29d7351721e8014a294
BLAKE2b-256 e359bd695d47e925600469fef35d506a86bd67ccd534bcd4a2ad92db28a3e579

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_data_core-0.1.0.tar.gz:

Publisher: publish.yml on parkerhancock/mcp-data-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mcp_data_core-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mcp_data_core-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 56.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mcp_data_core-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 52c454d32ad7a59f429f23538a7bde1e84d520708b413a53a43719c89d44f242
MD5 35ba0c7cb336078e057be02ee2bdb96d
BLAKE2b-256 b3cfbd7f9f39e10aa912f166b246cf0cf3cf286cc41a4a97b84c23cc4ffc5187

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_data_core-0.1.0-py3-none-any.whl:

Publisher: publish.yml on parkerhancock/mcp-data-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page