Skip to main content

Deterministic ingestion and normalization for MetaSPN

Project description

metaspn-io

metaspn-io is the ingestion and normalization layer for MetaSPN. It converts raw external records into canonical signal envelopes with deterministic IDs and ordering.

Quick Demo (5 lines)

python -m metaspn_io io ingest \
  --adapter social_jsonl_v1 \
  --source tests/fixtures/social \
  --date 2026-02-05 \
  --out /tmp/social-signals \
  --stats

v0.1 Adapters

  • social_jsonl_v1 (MUST): browser-extension social JSONL (post_seen, profile_seen)
  • outcomes_jsonl_v1 (SHOULD): manual outcomes JSONL (message_sent, reply_received, meeting_booked, revenue_event)

Schema Mapping

Input adapter Input type Output payload
social_jsonl_v1 post_seen SocialPostSeen
social_jsonl_v1 profile_seen ProfileSnapshotSeen
outcomes_jsonl_v1 message_sent MessageSent
outcomes_jsonl_v1 reply_received ReplyReceived
outcomes_jsonl_v1 meeting_booked MeetingBooked
outcomes_jsonl_v1 revenue_event RevenueEvent

Output Envelope

JSONL lines are emitted as canonical envelopes:

{
  "schema_version": "0.1",
  "signal_id": "s_4e9b5c8417d3af2ef9baf8d1",
  "timestamp": "2026-02-05T12:00:00Z",
  "source": "twitter",
  "payload_type": "SocialPostSeen",
  "payload": {
    "platform": "twitter",
    "author_handle": "alice",
    "post_url": "https://x.com/alice/status/1",
    "text": "hello world",
    "action": "seen"
  },
  "entity_refs": [
    {
      "kind": "platform_identifier",
      "platform": "twitter",
      "identifier": "alice"
    }
  ],
  "trace": {
    "ingested_at": "2026-02-06T00:00:00Z",
    "input_file": "raw/social/2026-02-05.jsonl",
    "input_line_number": 1,
    "adapter_name": "social_jsonl_v1",
    "adapter_version": "0.1",
    "raw_id": null,
    "original_timezone": "UTC"
  }
}

CLI

Primary command:

metaspn io ingest --adapter social_jsonl_v1 --source raw/social --out workspace/store/signals/2026-02-05.jsonl

Supported flags:

  • --source file or directory
  • --out output JSONL path or directory (with --date, writes <out>/<date>.jsonl)
  • --store optional store root (writes to <store>/signals/YYYY-MM-DD.jsonl)
  • --date one-day UTC ingest window (YYYY-MM-DD)
  • --since ISO timestamp lower bound
  • --until ISO timestamp upper bound
  • --dry-run
  • --stats
  • --lenient

Demo orchestrator invocation:

metaspn io ingest --adapter social_jsonl_v1 --source raw/social --date 2026-02-05 --out workspace/store/signals

Default mode is strict: bad records are skipped and logged to workspace/logs/ingest_errors.jsonl unless overridden.

Determinism Rules

  • Stable IDs via stable_signal_id(source, timestamp, key)
  • Timestamps normalized to UTC
  • Deterministic sort: timestamp, then canonical key
  • JSON output uses sorted keys

Add A New Adapter (<50 lines)

from dataclasses import dataclass
from pathlib import Path
from metaspn_io.adapters.base import AdapterOptions

@dataclass
class MyAdapter:
    name: str = "my_adapter_v1"
    version: str = "0.1"

    def iter_signals(self, source_path: Path, options: AdapterOptions | None = None):
        for raw in iter_jsonl_records(source_path):
            if isinstance(raw, ParseIssue):
                self.issues.append(raw)
                continue
            signal = convert_to_signal(raw)
            yield signal

Register it in metaspn_io.adapters.default_registry().

Tests

python3 -m pytest -q

Publishing

publish.yml publishes to PyPI when you push a version tag:

git tag -a v0.1.2 -m "v0.1.2"
git push origin v0.1.2

Configure PyPI trusted publishing for this GitHub repository, then the workflow will upload dist/* automatically. Before tagging, ensure CI (.github/workflows/ci.yml) is green, which validates python3 -m pytest -q and package build artifacts.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metaspn_io-0.1.2.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

metaspn_io-0.1.2-py3-none-any.whl (13.5 kB view details)

Uploaded Python 3

File details

Details for the file metaspn_io-0.1.2.tar.gz.

File metadata

  • Download URL: metaspn_io-0.1.2.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for metaspn_io-0.1.2.tar.gz
Algorithm Hash digest
SHA256 109c771f890ca190b61741d5f9df0df24fba2892de7ef308a6ded822bc9a7f81
MD5 3ddd93a9197207a44bb47b6cb39800ba
BLAKE2b-256 94ed7436d7c1b1dd266b6ea5071fcbb05ce877939f56c32a84f0e4f077b177f9

See more details on using hashes here.

Provenance

The following attestation bundles were made for metaspn_io-0.1.2.tar.gz:

Publisher: publish.yml on MetaSPN/metaspn-io

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file metaspn_io-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: metaspn_io-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 13.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for metaspn_io-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c0a470a0b31313d89eecd3c59404342899cc52feb8cac3bdbb3aeb8c994d84bc
MD5 2829d67d120c8a7ed8fb229ece8132d4
BLAKE2b-256 902e1f2e336a654a074642ec3fcf1bec83492c3c1399a675e71e4f7827329928

See more details on using hashes here.

Provenance

The following attestation bundles were made for metaspn_io-0.1.2-py3-none-any.whl:

Publisher: publish.yml on MetaSPN/metaspn-io

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page