Skip to main content

Syvain metrics collection SDK

Project description

Syvain Metrics Collector

Python SDK for sending experiment metrics and annotations to Syvain Metrics.

Use this package in training, evaluation, and analysis jobs that need one searchable experiment record with numeric metric series, run metadata, and human-readable notes.

Install

uv add syvain-metrics-collector

Basic Usage

from syvain_metrics_collector import Collector

collector = Collector("ak_org_...")

experiment = collector.experiment(
    "mamba-run-001",
    description="Baseline mamba training run",
    meta={
        "model": "mamba",
        "dataset": "internal-v1",
        "seed": 7,
    },
)

with experiment.run():
    for step in range(1_000):
        loss = 1.0 / (step + 1)
        experiment.metric("loss", loss, step=step, metadata={"split": "train"})

    experiment.annotation(
        "saved checkpoint",
        metadata={"path": "checkpoints/mamba-run-001/step-999.pt"},
    )

experiment.flush_or_raise()

The normal shape is:

  • create a Collector with a metrics API key
  • create one experiment per run
  • put stable run-level facts in meta
  • send numeric values with experiment.metric(...)
  • send notable events with experiment.annotation(...)
  • rely on experiment.run() for a best-effort flush when the context exits
  • call flush_or_raise() before process exit when delivery failure should fail the caller

Collector defaults to https://metrics.syvain.com, so most jobs only need an API key.

Experiment Metadata

Use meta for facts that apply to the whole run:

experiment = collector.experiment(
    "mamba-run-001",
    meta={
        "model": "mamba",
        "dataset": "internal-v1",
        "git_sha": "abc123",
        "config": {"batch_size": 32, "learning_rate": 0.0003},
    },
)

Good experiment metadata includes model name, dataset, seed, git SHA, machine type, and config values. Do not put per-step values in meta; put those on metrics.

Metrics

Metric values must be finite numbers. step is required by the Python method; pass step=None only for events that genuinely have no step.

experiment.metric("validation_loss", 0.182, step=500)

Use the same metric name for the same measured quantity:

experiment.metric("loss", train_loss, step=step, metadata={"split": "train"})
experiment.metric("loss", val_loss, step=step, metadata={"split": "validation"})

Use separate metric names when the quantity or unit is different:

experiment.metric("loss", 0.42, step=step, metadata={"split": "train"})
experiment.metric("accuracy", 0.91, step=step, metadata={"split": "validation"})
experiment.metric("tokens_per_second", 1820.0, step=step)

Metric Metadata

Metric metadata is how the dashboard separates related lines inside one metric. Keep it low-cardinality and easy to group:

experiment.metric(
    "gpu_utilization",
    78.0,
    step=step,
    metadata={"device": "gpu:0"},
)
experiment.metric(
    "gpu_utilization",
    74.0,
    step=step,
    metadata={"device": "gpu:1"},
)

Useful metadata keys include split, device, rank, phase, and prompt_set.

Avoid request IDs, timestamps, constantly changing file paths, and large nested payloads on metrics. Put one-off details in annotations instead.

Annotations

Use annotations for text events that explain the run:

experiment.annotation(
    "evaluation started",
    metadata={"split": "validation"},
)

Common annotations include checkpoints, phase changes, incidents, artifact paths, dashboard links, and manual operator notes.

Folders

If you know the folder ID, pass it directly:

experiment = collector.experiment(
    "mamba-run-001",
    folder_id="00000000-0000-0000-0000-000000000000",
)

If you only know the dashboard path, pass folder_path:

experiment = collector.experiment(
    "mamba-run-001",
    folder_path="/mamba-run-001",
)

Do not pass both. folder_path makes an extra API request to resolve the path to a folder ID and raises if the path is missing or ambiguous.

Flushing and Errors

Metric and annotation calls enqueue data locally and return quickly. The SDK flushes batches in the background after a short delay.

HTTP transport is delegated to syvain-metrics-api-client. Write requests use the API client's idempotency keys and built-in retry policy, and metric payloads receive a stable client_event_id before they enter the local queue so retried flushes keep the same event identity.

The experiment.run() context manager calls done() and then attempts a best-effort flush when the context exits. It retries three times by default and logs a warning if delivery is still incomplete. It does not raise on flush failure, drop queued data, or consume the retry budget used by later explicit flush calls. Exceptions from the training block still propagate.

Use flush() when you want one non-raising flush attempt and a status object:

result = experiment.flush()
if not result.ok:
    print(result.pending_metrics, result.retryable_failures)

Use flush_or_raise() when the caller should fail if any metric, annotation, or status update is still pending or failed. retries is the number of retry attempts after the first flush attempt:

experiment.flush_or_raise()
experiment.flush_or_raise(retries=5)

Experiment creation is required state and raises on failure. After an experiment exists, metric and annotation delivery is best effort unless you call flush_or_raise(). Explicit start() and done() lifecycle calls do not auto flush; call flush() or flush_or_raise() after done().

Manual Lifecycle

The context manager is enough for most jobs:

with experiment.run():
    experiment.metric("loss", 0.5, step=0)

Use explicit lifecycle calls when the run does not fit a single with block:

experiment.start()

for step in range(1_000):
    experiment.metric("loss", 1.0 / (step + 1), step=step)

experiment.done()
experiment.flush_or_raise()

If another supervisor owns process exit and exception handling, disable the SDK's process hooks:

with experiment.run(install_hooks=False):
    experiment.metric("loss", 0.5, step=0)

Local and Test Collectors

Use JsonlCollector when you want the same API shape but local JSONL output:

from pathlib import Path

from syvain_metrics_collector import JsonlCollector

collector = JsonlCollector(path=Path("artifacts/metrics/run-001.jsonl"))
experiment = collector.experiment("run-001", meta={"model": "mamba"})

with experiment.run():
    experiment.metric("loss", 0.42, step=1, metadata={"split": "train"})
    experiment.annotation("local checkpoint written", metadata={"path": "ckpt.pt"})

experiment.flush_or_raise()

Use NoopCollector in tests or dry runs that should accept metrics calls without network or file IO:

from syvain_metrics_collector import NoopCollector

collector = NoopCollector()
experiment = collector.experiment("unit-test-run")

with experiment.run():
    experiment.metric("loss", 0.42, step=1)

Constructor Options

collector = Collector(
    "ak_org_...",
    host="https://metrics.syvain.com",
    timeout=10.0,
    ingest_timeout=60.0,
    flush_delay_seconds=0.25,
    max_queue_items=100_000,
    max_batch_items=500,
    max_retries=None,
)
  • timeout: experiment creation and status update timeout
  • ingest_timeout: metric and annotation batch timeout
  • flush_delay_seconds: background batching delay
  • max_queue_items: maximum queued metrics plus annotations per experiment
  • max_batch_items: maximum items in one ingest request
  • max_retries: retry limit for retryable delivery failures; None retries indefinitely while respecting the queue limit

timestamp can be passed to metric(...) as seconds or milliseconds. Values below 10_000_000_000 are interpreted as seconds and converted to milliseconds.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syvain_metrics_collector-0.0.66.tar.gz (12.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

syvain_metrics_collector-0.0.66-py3-none-any.whl (13.5 kB view details)

Uploaded Python 3

File details

Details for the file syvain_metrics_collector-0.0.66.tar.gz.

File metadata

  • Download URL: syvain_metrics_collector-0.0.66.tar.gz
  • Upload date:
  • Size: 12.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for syvain_metrics_collector-0.0.66.tar.gz
Algorithm Hash digest
SHA256 5827d407d6d12e0d6f000d9b88fe0c43902589d0e340bbaf54643ede01f2d538
MD5 3021f328a2eecec6a541cf70a835d153
BLAKE2b-256 2d6270fb2de07e22d3282338810d13ef8960695b13c8c39cfcfe9bff128ec441

See more details on using hashes here.

File details

Details for the file syvain_metrics_collector-0.0.66-py3-none-any.whl.

File metadata

  • Download URL: syvain_metrics_collector-0.0.66-py3-none-any.whl
  • Upload date:
  • Size: 13.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for syvain_metrics_collector-0.0.66-py3-none-any.whl
Algorithm Hash digest
SHA256 5c8e66b7ad31d54f9c5336c3fceec457581774834967e278dfebdbf16637ec6e
MD5 342a0b995fe03825a50329c4c246f33f
BLAKE2b-256 73bef431ef110b8aeefbe363113fb3efc1ab7c33ad45aab824b8a34a24077154

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page