Skip to main content

Syvain metrics collection SDK

Project description

Syvain Metrics Collector

Python SDK for sending experiment metrics and annotations to Syvain Metrics.

Use this package in training, evaluation, and analysis jobs that need one searchable experiment record with numeric metric series, run metadata, and human-readable notes.

Install

uv add syvain-metrics-collector

Basic Usage

from syvain_metrics_collector import Collector

collector = Collector("ak_org_...")

experiment = collector.experiment(
    "mamba-run-001",
    description="Baseline mamba training run",
    meta={
        "model": "mamba",
        "dataset": "internal-v1",
        "seed": 7,
    },
)

with experiment.run():
    for step in range(1_000):
        loss = 1.0 / (step + 1)
        experiment.metric("loss", loss, step=step, metadata={"split": "train"})

    experiment.annotation(
        "saved checkpoint",
        metadata={"path": "checkpoints/mamba-run-001/step-999.pt"},
    )

experiment.flush_or_raise()

The normal shape is:

  • create a Collector with a metrics API key
  • create one experiment per run
  • put stable run-level facts in meta
  • send numeric values with experiment.metric(...)
  • send notable events with experiment.annotation(...)
  • rely on experiment.run() for a best-effort flush when the context exits
  • call flush_or_raise() before process exit when delivery failure should fail the caller

Collector defaults to https://metrics.syvain.com, so most jobs only need an API key.

Experiment Metadata

Use meta for facts that apply to the whole run:

experiment = collector.experiment(
    "mamba-run-001",
    meta={
        "model": "mamba",
        "dataset": "internal-v1",
        "git_sha": "abc123",
        "config": {"batch_size": 32, "learning_rate": 0.0003},
    },
)

Good experiment metadata includes model name, dataset, seed, git SHA, machine type, and config values. Do not put per-step values in meta; put those on metrics.

Metrics

Metric values must be finite numbers. step is required by the Python method; pass step=None only for events that genuinely have no step.

experiment.metric("validation_loss", 0.182, step=500)

Use the same metric name for the same measured quantity:

experiment.metric("loss", train_loss, step=step, metadata={"split": "train"})
experiment.metric("loss", val_loss, step=step, metadata={"split": "validation"})

Use separate metric names when the quantity or unit is different:

experiment.metric("loss", 0.42, step=step, metadata={"split": "train"})
experiment.metric("accuracy", 0.91, step=step, metadata={"split": "validation"})
experiment.metric("tokens_per_second", 1820.0, step=step)

Metric Metadata

Metric metadata is how the dashboard separates related lines inside one metric. Keep it low-cardinality and easy to group:

experiment.metric(
    "gpu_utilization",
    78.0,
    step=step,
    metadata={"device": "gpu:0"},
)
experiment.metric(
    "gpu_utilization",
    74.0,
    step=step,
    metadata={"device": "gpu:1"},
)

Useful metadata keys include split, device, rank, phase, and prompt_set.

Avoid request IDs, timestamps, constantly changing file paths, and large nested payloads on metrics. Put one-off details in annotations instead.

Annotations

Use annotations for text events that explain the run:

experiment.annotation(
    "evaluation started",
    metadata={"split": "validation"},
)

Common annotations include checkpoints, phase changes, incidents, artifact paths, dashboard links, and manual operator notes.

Folders

If you know the folder ID, pass it directly:

experiment = collector.experiment(
    "mamba-run-001",
    folder_id="00000000-0000-0000-0000-000000000000",
)

If you only know the dashboard path, pass folder_path:

experiment = collector.experiment(
    "mamba-run-001",
    folder_path="/mamba-run-001",
)

Do not pass both. folder_path makes an extra API request to resolve the path to a folder ID and raises if the path is missing or ambiguous.

Flushing and Errors

Metric and annotation calls enqueue data locally and return quickly. The SDK flushes batches in the background after a short delay.

HTTP transport is delegated to syvain-metrics-api-client. Write requests use the API client's idempotency keys and built-in retry policy, and metric payloads receive a stable client_event_id before they enter the local queue so retried flushes keep the same event identity.

The experiment.run() context manager calls done() and then attempts a best-effort flush when the context exits. It retries three times by default and logs a warning if delivery is still incomplete. It does not raise on flush failure, drop queued data, or consume the retry budget used by later explicit flush calls. Exceptions from the training block still propagate.

Use flush() when you want one non-raising flush attempt and a status object:

result = experiment.flush()
if not result.ok:
    print(result.pending_metrics, result.retryable_failures)

Use flush_or_raise() when the caller should fail if any metric, annotation, or status update is still pending or failed. retries is the number of retry attempts after the first flush attempt:

experiment.flush_or_raise()
experiment.flush_or_raise(retries=5)

Experiment creation is required state and raises on failure. After an experiment exists, metric and annotation delivery is best effort unless you call flush_or_raise(). Explicit start() and done() lifecycle calls do not auto flush; call flush() or flush_or_raise() after done().

Manual Lifecycle

The context manager is enough for most jobs:

with experiment.run():
    experiment.metric("loss", 0.5, step=0)

Use explicit lifecycle calls when the run does not fit a single with block:

experiment.start()

for step in range(1_000):
    experiment.metric("loss", 1.0 / (step + 1), step=step)

experiment.done()
experiment.flush_or_raise()

If another supervisor owns process exit and exception handling, disable the SDK's process hooks:

with experiment.run(install_hooks=False):
    experiment.metric("loss", 0.5, step=0)

Local and Test Collectors

Use JsonlCollector when you want the same API shape but local JSONL output:

from pathlib import Path

from syvain_metrics_collector import JsonlCollector

collector = JsonlCollector(path=Path("artifacts/metrics/run-001.jsonl"))
experiment = collector.experiment("run-001", meta={"model": "mamba"})

with experiment.run():
    experiment.metric("loss", 0.42, step=1, metadata={"split": "train"})
    experiment.annotation("local checkpoint written", metadata={"path": "ckpt.pt"})

experiment.flush_or_raise()

Use NoopCollector in tests or dry runs that should accept metrics calls without network or file IO:

from syvain_metrics_collector import NoopCollector

collector = NoopCollector()
experiment = collector.experiment("unit-test-run")

with experiment.run():
    experiment.metric("loss", 0.42, step=1)

Constructor Options

collector = Collector(
    "ak_org_...",
    host="https://metrics.syvain.com",
    timeout=10.0,
    ingest_timeout=60.0,
    flush_delay_seconds=0.25,
    max_queue_items=100_000,
    max_batch_items=500,
    max_retries=None,
)
  • timeout: experiment creation and status update timeout
  • ingest_timeout: metric and annotation batch timeout
  • flush_delay_seconds: background batching delay
  • max_queue_items: maximum queued metrics plus annotations per experiment
  • max_batch_items: maximum items in one ingest request
  • max_retries: retry limit for retryable delivery failures; None retries indefinitely while respecting the queue limit

timestamp can be passed to metric(...) as seconds or milliseconds. Values below 10_000_000_000 are interpreted as seconds and converted to milliseconds.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syvain_metrics_collector-0.0.71.tar.gz (12.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

syvain_metrics_collector-0.0.71-py3-none-any.whl (13.5 kB view details)

Uploaded Python 3

File details

Details for the file syvain_metrics_collector-0.0.71.tar.gz.

File metadata

  • Download URL: syvain_metrics_collector-0.0.71.tar.gz
  • Upload date:
  • Size: 12.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for syvain_metrics_collector-0.0.71.tar.gz
Algorithm Hash digest
SHA256 690a92378d26f9fe53f92e4ae81b6132876e9125c8912fe657b8f75ba00ced5c
MD5 4160ee924d087d9e3aa5d8d38418af0f
BLAKE2b-256 ff16ddccac60f025cb013ed79ae275a575523e3207b3753a45ee7bce3d812235

See more details on using hashes here.

File details

Details for the file syvain_metrics_collector-0.0.71-py3-none-any.whl.

File metadata

  • Download URL: syvain_metrics_collector-0.0.71-py3-none-any.whl
  • Upload date:
  • Size: 13.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for syvain_metrics_collector-0.0.71-py3-none-any.whl
Algorithm Hash digest
SHA256 5dcc557ca2da2250d61bfda74eb9beb82ff85bea223a0ab73b417234be904f70
MD5 5982a3bfdf3e0fbc20145917ba6fa390
BLAKE2b-256 79eb36c232f1db77e9f61fbc856a1e07f5cf3f5ddc9dcfd76116bc0cb48fdf36

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page