Skip to main content

**Unified ingestion, caching, and audit layer for Money Ex Machina.**

Project description

mxm-dataio

Version License Python Checked with pyright

Unified ingestion, caching, and audit layer for Money Ex Machina.

Overview

mxm-dataio is Money Ex Machina’s lightweight ingestion and audit backbone.
It records every external interaction (Session → Request → Response),
persists exact payload bytes, and stores structured metadata in SQLite.

It is designed for deterministic reproducibility, offline caching,
and transparent provenance across all MXM data sources.

Architecture at a glance

mxm-dataio/
├── DataIoSession      → runtime context (one logical run)
├── Request / Response → atomic data transactions
├── adapters/          → pluggable fetch/send implementations
└── store/             → SQLite-backed metadata and byte storage

Each interaction is represented as:

Session ─┬─> Request ──> Response
          └─> Request ──> Response

Raw bytes and parsed metadata are stored under:

<root>/responses/<session>/<hash>.json
<root>/blobs/<session>/<hash>.bin

Core model

Concept Role
Session Groups a set of related requests; ensures atomic persistence.
Request Deterministic identity of an operation (method + URL + params + headers).
Response Archived payload, metadata, and audit fields.
Adapter Tiny class implementing fetch() or send() returning an AdapterResult.
Registry Runtime mapping from adapter name → adapter instance.

Runtime API

DataIoSession

The main entry point for ingestion or submission tasks.

from mxm.dataio.api import DataIoSession
from mxm.dataio.adapters import HttpFetcher
from mxm.config import load_config
from mxm.dataio.config.config import dataio_view

cfg = load_config(package="mxm-dataio", env="dev", profile="default")
dio_cfg = dataio_view(cfg)

# Register an adapter under a source name
register("http", HttpFetcher())  # implements Fetcher

# Use the session with that source name
with DataIoSession(source="http", cfg=dio_cfg) as io:
    req = io.request(kind="demo", params={"q": "mxm"})
    resp = io.fetch(req)
    print(resp.status, resp.checksum, resp.path)

AdapterResult objects contain both the raw payload and normalized metadata:

from typing import Any
from mxm.types import HeadersLike, JSONObj

class AdapterResult:
    data: bytes
    content_type: str | None
    transport_status: int | None
    url: str | None
    elapsed_ms: int | None
    headers: HeadersLike | None
    adapter_meta: JSONObj | None

Configuration

mxm-dataio reads its settings from the dataio subtree of the global MXM config. Downstream packages obtain read-only views via mxm_config.make_view.

Configuration contract for mxm-dataio

mxm-dataio is a library and does not define a full application by itself. However, it ships a reference mxm-config seed tree under:

  • src/mxm/dataio/_data/seed/dataio/

This tree contains the standard 5-level structure expected by mxm-config:

  • default.yaml
  • machine.yaml
  • environment.yaml
  • profile.yaml
  • local.yaml

Downstream applications are expected to define their own app_id and config trees, but can copy or adapt this dataio seed as the canonical contract for how mxm-dataio expects to be configured (paths, cache roots, etc.).

Adapters

Adapters provide I/O logic while mxm-dataio handles persistence.

Example (simplified):

from typing import Any
from mxm.dataio.adapters import BaseFetcher
from mxm.dataio.types import AdapterResult
import requests

class HttpFetcher(BaseFetcher):
    def fetch(self, url: str, **params) -> AdapterResult:
        r = requests.get(url, params=params)
        return AdapterResult(
            payload=r.content,
            meta={"url": r.url, "headers": dict(r.headers)},
            content_type=r.headers.get("content-type"),
            status_code=r.status_code,
        )

Adapters can be registered dynamically:

from mxm.dataio.registry import register_adapter
register_adapter("http", HttpFetcher())

Caching and Volatility

Recent versions introduce a policy-driven caching system supporting:

  • Volatile sources (e.g. JustETF) that change daily
  • Eternal sources (e.g. FCA FIRDS) that never mutate after release
  • Fine-grained control via cache_mode, ttl_seconds, and as_of_bucket

Each request/response pair now carries explicit provenance metadata.

CacheMode semantics

Mode Behavior
default Use cached data if available and not expired; otherwise refetch
only_if_cached Never hit network; raise on cache miss
bypass Always refetch and persist new response
revalidate Future ETag support; currently same as default
never Fetch but never persist (ephemeral or side-effect requests)

Provenance fields

Field Type Meaning
cache_mode Enum[str] Policy governing cache use
ttl_seconds float | None Time-to-live in seconds; older entries are refetched
as_of_bucket str | None Logical “time partition” (e.g. "2025-10-27")
cache_tag str | None Optional sub-partition (e.g. language "en")

Example embedded in saved JSON:

"_provenance": {
  "response_id": "resp-123",
  "checksum": "sha256:…",
  "fetched_at": "2025-10-27T10:45:12Z",
  "cache_mode": "default",
  "ttl_seconds": 86400,
  "as_of_bucket": "2025-10-27",
  "cache_tag": "en"
}

Example usage

from mxm.dataio.api import DataIoSession

with DataIoSession(
    source="justetf",
    cfg=cfg,
    cache_mode="default",
    ttl=86400,
    as_of_bucket="2025-10-27",
) as s:
    req = s.request(kind="http", params={"u": "A"})
    resp = s.fetch(req)
    print(resp.checksum, resp.as_of_bucket)

This enables reproducible daily snapshots for volatile sources while preserving eternal datasets indefinitely.

Quick examples

Fetch and cache a resource

session = DataIoSession(cfg=dio_cfg)
result = session.fetch("https://example.com/data.json", fetcher="http")
print(result.status_code)

The payload and metadata are stored automatically in SQLite + filesystem. Subsequent identical requests are served from cache unless force_refresh=True.

Send data to an API

result = session.send("https://api.example.com/upload", data=b"...", sender="http")
print(result.status_code)

Design principles

  • Deterministic: identical inputs yield identical request IDs.
  • Auditable: all payloads and headers persisted for replay.
  • Minimal dependencies: pure Python, no ORM or framework assumptions.
  • Composable: adapters plug into any MXM package via registry.
  • Readable data: SQLite + JSON + raw bytes, human-inspectable.

Testing & quality

All tests are pure-Python and hermetic—no network calls.
Configuration YAMLs are loaded directly from the repo using a temporary
MXM_CONFIG_HOME fixture. The project is validated with:

pytest -q
pyright --strict
ruff check .
black --check .

Roadmap

  • Async adapters (aiohttp, websockets).
  • Multi-backend persistence (S3, DuckDB).
  • Delta auditing and content hashing improvements.
  • CLI for session inspection and cache management.

License

MIT License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mxm_dataio-0.4.1.tar.gz (22.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mxm_dataio-0.4.1-py3-none-any.whl (25.1 kB view details)

Uploaded Python 3

File details

Details for the file mxm_dataio-0.4.1.tar.gz.

File metadata

  • Download URL: mxm_dataio-0.4.1.tar.gz
  • Upload date:
  • Size: 22.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mxm_dataio-0.4.1.tar.gz
Algorithm Hash digest
SHA256 59674a093b11313a64bba9816b8b7ccd58bc4b44efc7980e058e4a9fd0eb977b
MD5 ba5f7a60faffb4a46bad83647ca60703
BLAKE2b-256 20b05dd8e9612ac4a369c01a98bc9f33422c2a17eeb6605892e37abee812355c

See more details on using hashes here.

Provenance

The following attestation bundles were made for mxm_dataio-0.4.1.tar.gz:

Publisher: release.yml on moneyexmachina/mxm-dataio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mxm_dataio-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: mxm_dataio-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 25.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mxm_dataio-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d8253115ebb3c8f1f1b2ff60fde8de9adf6d91752f13e932f4419a94f3aaf33d
MD5 e9d2c4bee096b8fec0e30c2e3044e8ef
BLAKE2b-256 ce9a8b0a9299f7e2a042167b149fbdba05200dfaa4dfe2072acf2e9b1cafc46c

See more details on using hashes here.

Provenance

The following attestation bundles were made for mxm_dataio-0.4.1-py3-none-any.whl:

Publisher: release.yml on moneyexmachina/mxm-dataio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page