Skip to main content

Deterministic ingestion and normalization for MetaSPN

Project description

metaspn-io

metaspn-io is the ingestion and normalization layer for MetaSPN. It converts raw external records into canonical signal envelopes with deterministic IDs and ordering.

Quick Demo (5 lines)

python -m metaspn_io io ingest \
  --adapter social_jsonl_v1 \
  --source tests/fixtures/social \
  --out /tmp/social-signals.jsonl \
  --stats

v0.1 Adapters

  • social_jsonl_v1 (MUST): browser-extension social JSONL (post_seen, profile_seen)
  • outcomes_jsonl_v1 (SHOULD): manual outcomes JSONL (message_sent, reply_received, meeting_booked, revenue_event)

Schema Mapping

Input adapter Input type Output payload
social_jsonl_v1 post_seen SocialPostSeen
social_jsonl_v1 profile_seen ProfileSnapshotSeen
outcomes_jsonl_v1 message_sent MessageSent
outcomes_jsonl_v1 reply_received ReplyReceived
outcomes_jsonl_v1 meeting_booked MeetingBooked
outcomes_jsonl_v1 revenue_event RevenueEvent

Output Envelope

JSONL lines are emitted as canonical envelopes:

{
  "schema_version": "0.1",
  "signal_id": "s_4e9b5c8417d3af2ef9baf8d1",
  "timestamp": "2026-02-05T12:00:00Z",
  "source": "twitter",
  "payload_type": "SocialPostSeen",
  "payload": {
    "platform": "twitter",
    "author_handle": "alice",
    "post_url": "https://x.com/alice/status/1",
    "text": "hello world",
    "action": "seen"
  },
  "entity_refs": [
    {
      "kind": "platform_identifier",
      "platform": "twitter",
      "identifier": "alice"
    }
  ],
  "trace": {
    "ingested_at": "2026-02-06T00:00:00Z",
    "input_file": "raw/social/2026-02-05.jsonl",
    "input_line_number": 1,
    "adapter_name": "social_jsonl_v1",
    "adapter_version": "0.1",
    "raw_id": null,
    "original_timezone": "UTC"
  }
}

CLI

Primary command:

metaspn io ingest --adapter social_jsonl_v1 --source raw/social --out workspace/store/signals/2026-02-05.jsonl

Supported flags:

  • --source file or directory
  • --out output JSONL path
  • --store optional store root (writes to <store>/signals/YYYY-MM-DD.jsonl)
  • --since ISO timestamp lower bound
  • --until ISO timestamp upper bound
  • --dry-run
  • --stats
  • --lenient

Default mode is strict: bad records are skipped and logged to workspace/logs/ingest_errors.jsonl unless overridden.

Determinism Rules

  • Stable IDs via stable_signal_id(source, timestamp, key)
  • Timestamps normalized to UTC
  • Deterministic sort: timestamp, then canonical key
  • JSON output uses sorted keys

Add A New Adapter (<50 lines)

from dataclasses import dataclass
from pathlib import Path
from metaspn_io.adapters.base import AdapterOptions

@dataclass
class MyAdapter:
    name: str = "my_adapter_v1"
    version: str = "0.1"

    def iter_signals(self, source_path: Path, options: AdapterOptions | None = None):
        for raw in iter_jsonl_records(source_path):
            if isinstance(raw, ParseIssue):
                self.issues.append(raw)
                continue
            signal = convert_to_signal(raw)
            yield signal

Register it in metaspn_io.adapters.default_registry().

Tests

PYTHONPATH=src python -m unittest discover -s tests -v

Publishing

publish.yml publishes to PyPI when you push a version tag:

git tag -a v0.1.0 -m "v0.1.0"
git push origin v0.1.0

Configure PyPI trusted publishing for this GitHub repository, then the workflow will upload dist/* automatically.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metaspn_io-0.1.0.tar.gz (11.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

metaspn_io-0.1.0-py3-none-any.whl (13.1 kB view details)

Uploaded Python 3

File details

Details for the file metaspn_io-0.1.0.tar.gz.

File metadata

  • Download URL: metaspn_io-0.1.0.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for metaspn_io-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8aa9d2ba7ab330ce35bbf599327b4a2148f3e5dcc6dce7659fad34264d8e134f
MD5 27ef5a943ea5b7907c7dd783a1a15ada
BLAKE2b-256 83b0bc2feb52c8e5d56da6f0d7f7c3a3c2d6d868871ffc35034b6f9e5a62da15

See more details on using hashes here.

Provenance

The following attestation bundles were made for metaspn_io-0.1.0.tar.gz:

Publisher: publish.yml on MetaSPN/metaspn-io

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file metaspn_io-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: metaspn_io-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for metaspn_io-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 15d85941abc2e1326d260f94bd573bca0ad2263b0fe8b47f0e95f85a6aec6017
MD5 161452bff33ba733f937ff791ec98e5c
BLAKE2b-256 6a8494c631c308cba8ad8ab6bf7897adc442bb6b7e5e10ab55e7ebc3e5d53f8a

See more details on using hashes here.

Provenance

The following attestation bundles were made for metaspn_io-0.1.0-py3-none-any.whl:

Publisher: publish.yml on MetaSPN/metaspn-io

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page