**Unified ingestion, caching, and audit layer for Money Ex Machina.**
Project description
mxm-dataio
Unified ingestion, caching, and audit layer for Money Ex Machina.
Overview
mxm-dataio is Money Ex Machina’s lightweight ingestion and audit backbone.
It records every external interaction (Session → Request → Response),
persists exact payload bytes, and stores structured metadata in SQLite.
It is designed for deterministic reproducibility, offline caching,
and transparent provenance across all MXM data sources.
Architecture at a glance
mxm-dataio/
├── DataIoSession → runtime context (one logical run)
├── Request / Response → atomic data transactions
├── adapters/ → pluggable fetch/send implementations
└── store/ → SQLite-backed metadata and byte storage
Each interaction is represented as:
Session ─┬─> Request ──> Response
└─> Request ──> Response
Raw bytes and parsed metadata are stored under:
<root>/responses/<session>/<hash>.json
<root>/blobs/<session>/<hash>.bin
Core model
| Concept | Role |
|---|---|
| Session | Groups a set of related requests; ensures atomic persistence. |
| Request | Deterministic identity of an operation (method + URL + params + headers). |
| Response | Archived payload, metadata, and audit fields. |
| Adapter | Tiny class implementing fetch() or send() returning an AdapterResult. |
| Registry | Runtime mapping from adapter name → adapter instance. |
Runtime API
DataIoSession
The main entry point for ingestion or submission tasks.
from mxm.dataio.api import DataIoSession
from mxm.dataio.adapters import HttpFetcher
from mxm.config import load_config
from mxm.dataio.config.config import dataio_view
cfg = load_config(package="mxm-dataio", env="dev", profile="default")
dio_cfg = dataio_view(cfg)
# Register an adapter under a source name
register("http", HttpFetcher()) # implements Fetcher
# Use the session with that source name
with DataIoSession(source="http", cfg=dio_cfg) as io:
req = io.request(kind="demo", params={"q": "mxm"})
resp = io.fetch(req)
print(resp.status, resp.checksum, resp.path)
AdapterResult objects contain both the raw payload and normalized metadata:
from typing import Any
from mxm.types import HeadersLike, JSONObj
class AdapterResult:
data: bytes
content_type: str | None
transport_status: int | None
url: str | None
elapsed_ms: int | None
headers: HeadersLike | None
adapter_meta: JSONObj | None
Configuration
mxm-dataio reads its settings from the dataio subtree
of the global MXM config. Downstream packages obtain read-only
views via mxm_config.make_view.
Configuration contract for mxm-dataio
mxm-dataio is a library and does not define a full application by itself.
However, it ships a reference mxm-config seed tree under:
src/mxm/dataio/_data/seed/dataio/
This tree contains the standard 5-level structure expected by mxm-config:
default.yamlmachine.yamlenvironment.yamlprofile.yamllocal.yaml
Downstream applications are expected to define their own app_id and config
trees, but can copy or adapt this dataio seed as the canonical contract for
how mxm-dataio expects to be configured (paths, cache roots, etc.).
Adapters
Adapters provide I/O logic while mxm-dataio handles persistence.
Example (simplified):
from typing import Any
from mxm.dataio.adapters import BaseFetcher
from mxm.dataio.types import AdapterResult
import requests
class HttpFetcher(BaseFetcher):
def fetch(self, url: str, **params) -> AdapterResult:
r = requests.get(url, params=params)
return AdapterResult(
payload=r.content,
meta={"url": r.url, "headers": dict(r.headers)},
content_type=r.headers.get("content-type"),
status_code=r.status_code,
)
Adapters can be registered dynamically:
from mxm.dataio.registry import register_adapter
register_adapter("http", HttpFetcher())
Caching and Volatility
Recent versions introduce a policy-driven caching system supporting:
- Volatile sources (e.g. JustETF) that change daily
- Eternal sources (e.g. FCA FIRDS) that never mutate after release
- Fine-grained control via
cache_mode,ttl_seconds, andas_of_bucket
Each request/response pair now carries explicit provenance metadata.
CacheMode semantics
| Mode | Behavior |
|---|---|
default |
Use cached data if available and not expired; otherwise refetch |
only_if_cached |
Never hit network; raise on cache miss |
bypass |
Always refetch and persist new response |
revalidate |
Future ETag support; currently same as default |
never |
Fetch but never persist (ephemeral or side-effect requests) |
Provenance fields
| Field | Type | Meaning |
|---|---|---|
cache_mode |
Enum[str] | Policy governing cache use |
ttl_seconds |
float | None | Time-to-live in seconds; older entries are refetched |
as_of_bucket |
str | None | Logical “time partition” (e.g. "2025-10-27") |
cache_tag |
str | None | Optional sub-partition (e.g. language "en") |
Example embedded in saved JSON:
"_provenance": {
"response_id": "resp-123",
"checksum": "sha256:…",
"fetched_at": "2025-10-27T10:45:12Z",
"cache_mode": "default",
"ttl_seconds": 86400,
"as_of_bucket": "2025-10-27",
"cache_tag": "en"
}
Example usage
from mxm.dataio.api import DataIoSession
with DataIoSession(
source="justetf",
cfg=cfg,
cache_mode="default",
ttl=86400,
as_of_bucket="2025-10-27",
) as s:
req = s.request(kind="http", params={"u": "A"})
resp = s.fetch(req)
print(resp.checksum, resp.as_of_bucket)
This enables reproducible daily snapshots for volatile sources while preserving eternal datasets indefinitely.
Quick examples
Fetch and cache a resource
session = DataIoSession(cfg=dio_cfg)
result = session.fetch("https://example.com/data.json", fetcher="http")
print(result.status_code)
The payload and metadata are stored automatically in SQLite + filesystem.
Subsequent identical requests are served from cache unless force_refresh=True.
Send data to an API
result = session.send("https://api.example.com/upload", data=b"...", sender="http")
print(result.status_code)
Design principles
- Deterministic: identical inputs yield identical request IDs.
- Auditable: all payloads and headers persisted for replay.
- Minimal dependencies: pure Python, no ORM or framework assumptions.
- Composable: adapters plug into any MXM package via registry.
- Readable data: SQLite + JSON + raw bytes, human-inspectable.
Testing & quality
All tests are pure-Python and hermetic—no network calls.
Configuration YAMLs are loaded directly from the repo using a temporary
MXM_CONFIG_HOME fixture. The project is validated with:
pytest -q
pyright --strict
ruff check .
black --check .
Roadmap
- Async adapters (
aiohttp, websockets). - Multi-backend persistence (S3, DuckDB).
- Delta auditing and content hashing improvements.
- CLI for session inspection and cache management.
License
MIT License. See LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mxm_dataio-0.4.1.tar.gz.
File metadata
- Download URL: mxm_dataio-0.4.1.tar.gz
- Upload date:
- Size: 22.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59674a093b11313a64bba9816b8b7ccd58bc4b44efc7980e058e4a9fd0eb977b
|
|
| MD5 |
ba5f7a60faffb4a46bad83647ca60703
|
|
| BLAKE2b-256 |
20b05dd8e9612ac4a369c01a98bc9f33422c2a17eeb6605892e37abee812355c
|
Provenance
The following attestation bundles were made for mxm_dataio-0.4.1.tar.gz:
Publisher:
release.yml on moneyexmachina/mxm-dataio
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mxm_dataio-0.4.1.tar.gz -
Subject digest:
59674a093b11313a64bba9816b8b7ccd58bc4b44efc7980e058e4a9fd0eb977b - Sigstore transparency entry: 842056593
- Sigstore integration time:
-
Permalink:
moneyexmachina/mxm-dataio@bce72d91fc9ff3dab27412fda839d0dc8f763a2f -
Branch / Tag:
refs/tags/v0.4.1 - Owner: https://github.com/moneyexmachina
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@bce72d91fc9ff3dab27412fda839d0dc8f763a2f -
Trigger Event:
push
-
Statement type:
File details
Details for the file mxm_dataio-0.4.1-py3-none-any.whl.
File metadata
- Download URL: mxm_dataio-0.4.1-py3-none-any.whl
- Upload date:
- Size: 25.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d8253115ebb3c8f1f1b2ff60fde8de9adf6d91752f13e932f4419a94f3aaf33d
|
|
| MD5 |
e9d2c4bee096b8fec0e30c2e3044e8ef
|
|
| BLAKE2b-256 |
ce9a8b0a9299f7e2a042167b149fbdba05200dfaa4dfe2072acf2e9b1cafc46c
|
Provenance
The following attestation bundles were made for mxm_dataio-0.4.1-py3-none-any.whl:
Publisher:
release.yml on moneyexmachina/mxm-dataio
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mxm_dataio-0.4.1-py3-none-any.whl -
Subject digest:
d8253115ebb3c8f1f1b2ff60fde8de9adf6d91752f13e932f4419a94f3aaf33d - Sigstore transparency entry: 842056632
- Sigstore integration time:
-
Permalink:
moneyexmachina/mxm-dataio@bce72d91fc9ff3dab27412fda839d0dc8f763a2f -
Branch / Tag:
refs/tags/v0.4.1 - Owner: https://github.com/moneyexmachina
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@bce72d91fc9ff3dab27412fda839d0dc8f763a2f -
Trigger Event:
push
-
Statement type: