Shared async HTTP scaffolding, response envelopes, corpus storage, and MCP server plumbing for data-fetching research toolkits.
Project description
mcp-data-core
Batteries-included async HTTP scaffolding and MCP server plumbing for Python data-fetching libraries.
If you're writing a Python library that pulls structured data from an API — patents, court filings, FDA records, financial filings, anything — you end up rebuilding the same eight things: an httpx client with retry, an HTTP cache, a tenacity policy, an OAuth helper, response envelopes, a typed exception hierarchy, per-app logging, and (if you ship an MCP server) tool registration, auth, and signed downloads. mcp-data-core is those eight things, packaged.
Quick Start
uv add mcp-data-core # core scaffolding
uv add "mcp-data-core[mcp]" # + FastMCP server helpers
from mcp_data_core import BaseAsyncClient
class MyApiClient(BaseAsyncClient):
DEFAULT_BASE_URL = "https://api.example.com"
CACHE_NAME = "my_api"
async def get_thing(self, id: str) -> dict:
return await self._request_json("GET", f"/things/{id}")
async with MyApiClient() as client:
result = await client.get_thing("42")
stats = await client.cache_stats()
print(f"Cache hit rate: {stats.hit_rate:.1f}%")
That's the full surface. Retry, caching, error mapping, and connection pooling are already wired up.
Features
| Feature | What you get |
|---|---|
BaseAsyncClient |
httpx.AsyncClient subclass with retry, caching, error mapping, and cache-management methods. Override DEFAULT_BASE_URL + CACHE_NAME; the rest is inherited. |
| HTTP caching | hishel-backed cache with a custom SQLite/WAL storage layer. Respects HTTP cache headers by default, with TTL override. Inspection (cache_stats), eviction (cache_clear_expired), and pattern-based invalidation (cache_invalidate) built in. |
| Retry policy | tenacity-based exponential-jitter retry (4 attempts default). Retryable status set covers 408, 429, 500-504. Honors Retry-After headers. |
| OAuth2 client credentials | OAuth2ClientCredentialsAuth — drop-in httpx.Auth that handles token refresh, retries on 401, and works behind the cache layer. |
| Response envelopes | ResponseEnvelope, ListEnvelope, Provenance. Cursor-based pagination helpers (encode_cursor / decode_cursor). Every response carries source provenance so downstream consumers can cite. |
| Typed exceptions | McpDataCoreError base + ApiError, RateLimitError, NotFoundError, AuthenticationError, ServerError, ConfigurationError, ValidationError, ParseError. Log-first error formatting: str(err) appends the log path so agents can inspect without keeping stacktraces in context. |
| Per-app file logging | logging.configure("my_app") attaches a file handler under the my_app logger tree, writing to ~/.cache/my_app/my_app.log. Idempotent; each consumer library logs to its own file. |
| Bundled corpora | corpus_db (SQLite/FTS5 reader) and corpus_compression (zstd) for libraries that ship statutes, manuals, or other reference text alongside their API client. |
| MCP server scaffolding (opt-in) | FastMCP server factory, bearer-token auth, domain gating middleware, conditional tool registration, signed HMAC download URLs with on-disk cache, and OAuth 2.1 + PKCE + DCR helpers. |
Real-world usage
A trimmed-down version of how patent-client-agents wires up a USPTO connector:
import os
from mcp_data_core import (
BaseAsyncClient,
ConfigurationError,
ListEnvelope,
make_provenance,
)
BASE_URL = "https://api.uspto.gov"
class UsptoOdpClient(BaseAsyncClient):
DEFAULT_BASE_URL = BASE_URL
CACHE_NAME = "uspto_odp"
def __init__(self, *, api_key: str | None = None, **kwargs) -> None:
api_key = api_key or os.environ.get("USPTO_ODP_API_KEY")
if not api_key:
raise ConfigurationError("USPTO_ODP_API_KEY required")
super().__init__(headers={"X-API-KEY": api_key}, **kwargs)
async def search_applications(
self, query: str, *, limit: int = 25
) -> ListEnvelope[dict]:
payload = await self._request_json(
"POST",
"/api/v1/patent/applications/search",
json={"q": query, "pagination": {"limit": limit}},
)
return ListEnvelope(
summary=f"{payload['count']} applications matching {query!r}",
items=payload["patentFileWrapperDataBag"],
provenance=make_provenance(
source_url=f"{BASE_URL}/api/v1/patent/applications/search",
source_name="USPTO Open Data Portal",
),
)
No retry loop. No cache invalidation. No exception remapping. No connection lifecycle. The library author writes the API-shaped methods; mcp-data-core handles everything else.
What's inside
mcp_data_core/
├── base_client.py # BaseAsyncClient
├── cache.py # CacheManager, build_cached_http_client, SQLite/WAL storage
├── resilience.py # default_retryer, with_retry, RETRYABLE_STATUS_CODES
├── oauth2.py # OAuth2ClientCredentialsAuth
├── envelope.py # ResponseEnvelope, ListEnvelope, Provenance, cursor helpers
├── exceptions.py # McpDataCoreError + 8 subclasses
├── logging.py # configure() — per-app file logging
├── filenames.py # Download filename conventions
├── corpus_db.py # SQLite/FTS5 corpus reader
├── corpus_compression.py # zstd helpers
└── mcp/ # Optional — installed via [mcp] extra
├── server_factory.py # FastMCP app factory
├── auth.py # OAuth 2.1 + bearer-token helpers
├── middleware.py # Domain gate, friendly errors, logging
├── conditional.py # Conditional tool registration
├── downloads.py # Signed HMAC download URLs + on-disk cache
└── annotations.py # Tool annotations (READ_ONLY, DESTRUCTIVE)
Provenance
Extracted from patent-client-agents 0.20.0 (May 2026) where it had matured as the shared infrastructure across multiple law and patent connectors. Split out as a standalone package so non-IP toolkits — regulatory (FDA), financial, scientific — can use the same scaffolding without pulling the IP-specific connector surface.
Used by
- patent-client-agents — IP-registry connectors (USPTO, EPO, JPO, EUIPO, IP Australia, …)
Compatibility
- Python 3.11, 3.12, 3.13
- macOS, Linux. Windows untested.
httpx0.27+,pydantic2.7+,tenacity8.4+
Development
git clone https://github.com/parkerhancock/mcp-data-core
cd mcp-data-core
uv sync --all-extras --dev
uv run pytest # 166 tests
uv run ruff check src tests
uv run ruff format src tests
Tests are pure-Python — no network, no fixtures, no live APIs. They exercise the cache, retry policy, OAuth refresh, MCP middleware, signed download URLs, and corpus reader against an in-memory transport.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcp_data_core-0.1.0.tar.gz.
File metadata
- Download URL: mcp_data_core-0.1.0.tar.gz
- Upload date:
- Size: 66.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eca664379b6b403a9659573ada40fb005142f7f48ad63957323f39ce735e882a
|
|
| MD5 |
4725b0a58249e29d7351721e8014a294
|
|
| BLAKE2b-256 |
e359bd695d47e925600469fef35d506a86bd67ccd534bcd4a2ad92db28a3e579
|
Provenance
The following attestation bundles were made for mcp_data_core-0.1.0.tar.gz:
Publisher:
publish.yml on parkerhancock/mcp-data-core
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mcp_data_core-0.1.0.tar.gz -
Subject digest:
eca664379b6b403a9659573ada40fb005142f7f48ad63957323f39ce735e882a - Sigstore transparency entry: 1594689045
- Sigstore integration time:
-
Permalink:
parkerhancock/mcp-data-core@486d0ed85224a02df17f7154be1100adc07e1fed -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/parkerhancock
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@486d0ed85224a02df17f7154be1100adc07e1fed -
Trigger Event:
push
-
Statement type:
File details
Details for the file mcp_data_core-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mcp_data_core-0.1.0-py3-none-any.whl
- Upload date:
- Size: 56.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
52c454d32ad7a59f429f23538a7bde1e84d520708b413a53a43719c89d44f242
|
|
| MD5 |
35ba0c7cb336078e057be02ee2bdb96d
|
|
| BLAKE2b-256 |
b3cfbd7f9f39e10aa912f166b246cf0cf3cf286cc41a4a97b84c23cc4ffc5187
|
Provenance
The following attestation bundles were made for mcp_data_core-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on parkerhancock/mcp-data-core
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mcp_data_core-0.1.0-py3-none-any.whl -
Subject digest:
52c454d32ad7a59f429f23538a7bde1e84d520708b413a53a43719c89d44f242 - Sigstore transparency entry: 1594689101
- Sigstore integration time:
-
Permalink:
parkerhancock/mcp-data-core@486d0ed85224a02df17f7154be1100adc07e1fed -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/parkerhancock
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@486d0ed85224a02df17f7154be1100adc07e1fed -
Trigger Event:
push
-
Statement type: