Skip to main content

Syvain metrics collection SDK

Project description

Syvain Metrics Collector

Python SDK for sending experiment metrics and annotations to Syvain Metrics.

Use this package in training, evaluation, and analysis jobs that need one searchable experiment record with numeric metric series, run metadata, and human-readable notes.

Install

uv add syvain-metrics-collector

Basic Usage

from syvain_metrics_collector import Collector

collector = Collector("ak_org_...")

experiment = collector.experiment(
    "mamba-run-001",
    description="Baseline mamba training run",
    meta={
        "model": "mamba",
        "dataset": "internal-v1",
        "seed": 7,
    },
)

with experiment.run():
    for step in range(1_000):
        loss = 1.0 / (step + 1)
        experiment.metric("loss", loss, step=step, metadata={"split": "train"})

    experiment.annotation(
        "saved checkpoint",
        metadata={"path": "checkpoints/mamba-run-001/step-999.pt"},
    )

experiment.flush_or_raise()

The normal shape is:

  • create a Collector with a metrics API key
  • create one experiment per run
  • put stable run-level facts in meta
  • send numeric values with experiment.metric(...)
  • send notable events with experiment.annotation(...)
  • rely on experiment.run() for a best-effort flush when the context exits
  • call flush_or_raise() before process exit when delivery failure should fail the caller

Collector defaults to https://metrics.syvain.com, so most jobs only need an API key.

Experiment Metadata

Use meta for facts that apply to the whole run:

experiment = collector.experiment(
    "mamba-run-001",
    meta={
        "model": "mamba",
        "dataset": "internal-v1",
        "git_sha": "abc123",
        "config": {"batch_size": 32, "learning_rate": 0.0003},
    },
)

Good experiment metadata includes model name, dataset, seed, git SHA, machine type, and config values. Do not put per-step values in meta; put those on metrics.

Metrics

Metric values must be finite numbers. step is required by the Python method; pass step=None only for events that genuinely have no step.

experiment.metric("validation_loss", 0.182, step=500)

Use the same metric name for the same measured quantity:

experiment.metric("loss", train_loss, step=step, metadata={"split": "train"})
experiment.metric("loss", val_loss, step=step, metadata={"split": "validation"})

Use separate metric names when the quantity or unit is different:

experiment.metric("loss", 0.42, step=step, metadata={"split": "train"})
experiment.metric("accuracy", 0.91, step=step, metadata={"split": "validation"})
experiment.metric("tokens_per_second", 1820.0, step=step)

Metric Metadata

Metric metadata is how the dashboard separates related lines inside one metric. Keep it low-cardinality and easy to group:

experiment.metric(
    "gpu_utilization",
    78.0,
    step=step,
    metadata={"device": "gpu:0"},
)
experiment.metric(
    "gpu_utilization",
    74.0,
    step=step,
    metadata={"device": "gpu:1"},
)

Useful metadata keys include split, device, rank, phase, and prompt_set.

Avoid request IDs, timestamps, constantly changing file paths, and large nested payloads on metrics. Put one-off details in annotations instead.

Annotations

Use annotations for text events that explain the run:

experiment.annotation(
    "evaluation started",
    metadata={"split": "validation"},
)

Common annotations include checkpoints, phase changes, incidents, artifact paths, dashboard links, and manual operator notes.

Folders

If you know the folder ID, pass it directly:

experiment = collector.experiment(
    "mamba-run-001",
    folder_id="00000000-0000-0000-0000-000000000000",
)

If you only know the dashboard path, pass folder_path:

experiment = collector.experiment(
    "mamba-run-001",
    folder_path="/mamba-run-001",
)

Do not pass both. folder_path makes an extra API request to resolve the path to a folder ID and raises if the path is missing or ambiguous.

Flushing and Errors

Metric and annotation calls enqueue data locally and return quickly. The SDK flushes batches in the background after a short delay.

The experiment.run() context manager calls done() and then attempts a best-effort flush when the context exits. It retries three times by default and logs a warning if delivery is still incomplete. It does not raise on flush failure, drop queued data, or consume the retry budget used by later explicit flush calls. Exceptions from the training block still propagate.

Use flush() when you want one non-raising flush attempt and a status object:

result = experiment.flush()
if not result.ok:
    print(result.pending_metrics, result.retryable_failures)

Use flush_or_raise() when the caller should fail if any metric, annotation, or status update is still pending or failed. retries is the number of retry attempts after the first flush attempt:

experiment.flush_or_raise()
experiment.flush_or_raise(retries=5)

Experiment creation is required state and raises on failure. After an experiment exists, metric and annotation delivery is best effort unless you call flush_or_raise(). Explicit start() and done() lifecycle calls do not auto flush; call flush() or flush_or_raise() after done().

Manual Lifecycle

The context manager is enough for most jobs:

with experiment.run():
    experiment.metric("loss", 0.5, step=0)

Use explicit lifecycle calls when the run does not fit a single with block:

experiment.start()

for step in range(1_000):
    experiment.metric("loss", 1.0 / (step + 1), step=step)

experiment.done()
experiment.flush_or_raise()

If another supervisor owns process exit and exception handling, disable the SDK's process hooks:

with experiment.run(install_hooks=False):
    experiment.metric("loss", 0.5, step=0)

Local and Test Collectors

Use JsonlCollector when you want the same API shape but local JSONL output:

from pathlib import Path

from syvain_metrics_collector import JsonlCollector

collector = JsonlCollector(path=Path("artifacts/metrics/run-001.jsonl"))
experiment = collector.experiment("run-001", meta={"model": "mamba"})

with experiment.run():
    experiment.metric("loss", 0.42, step=1, metadata={"split": "train"})
    experiment.annotation("local checkpoint written", metadata={"path": "ckpt.pt"})

experiment.flush_or_raise()

Use NoopCollector in tests or dry runs that should accept metrics calls without network or file IO:

from syvain_metrics_collector import NoopCollector

collector = NoopCollector()
experiment = collector.experiment("unit-test-run")

with experiment.run():
    experiment.metric("loss", 0.42, step=1)

Constructor Options

collector = Collector(
    "ak_org_...",
    host="https://metrics.syvain.com",
    timeout=10.0,
    ingest_timeout=60.0,
    flush_delay_seconds=0.25,
    max_queue_items=100_000,
    max_batch_items=500,
    max_retries=None,
)
  • timeout: experiment creation and status update timeout
  • ingest_timeout: metric and annotation batch timeout
  • flush_delay_seconds: background batching delay
  • max_queue_items: maximum queued metrics plus annotations per experiment
  • max_batch_items: maximum items in one ingest request
  • max_retries: retry limit for retryable delivery failures; None retries indefinitely while respecting the queue limit

timestamp can be passed to metric(...) as seconds or milliseconds. Values below 10_000_000_000 are interpreted as seconds and converted to milliseconds.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syvain_metrics_collector-0.0.62.tar.gz (13.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

syvain_metrics_collector-0.0.62-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file syvain_metrics_collector-0.0.62.tar.gz.

File metadata

  • Download URL: syvain_metrics_collector-0.0.62.tar.gz
  • Upload date:
  • Size: 13.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for syvain_metrics_collector-0.0.62.tar.gz
Algorithm Hash digest
SHA256 e787f8e08c6c52b9cf16ac680b57f2b65d13624aafbc126ebd432370fa23a483
MD5 dc2e20843fd8311ebc7704e1dec3893a
BLAKE2b-256 256962ea5a6630fdd40937d3e2a06717116bb5bb7ab51e05023be6918c05f1c8

See more details on using hashes here.

File details

Details for the file syvain_metrics_collector-0.0.62-py3-none-any.whl.

File metadata

  • Download URL: syvain_metrics_collector-0.0.62-py3-none-any.whl
  • Upload date:
  • Size: 15.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for syvain_metrics_collector-0.0.62-py3-none-any.whl
Algorithm Hash digest
SHA256 6e6d162de97291ec8a3554525a22aaeeab1c6bebc56ab775871ac64ed300efe0
MD5 97be74f254012b4233178cb902828471
BLAKE2b-256 a6aa99b516295c0749c1e3e341eacc045de0a22edd655d980fcac874fa4238c1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page