Skip to main content

A minimal library for capturing ML/NLP operation traces for later training data export

Project description

oplogger

A minimal library for capturing ML/NLP operation traces for later training data export.

Installation

pip install oplogger

# With PostgreSQL support
pip install oplogger[postgres]

# With export features (pandas, HuggingFace datasets)
pip install oplogger[export]

Note: The package is installed as oplogger but imported as oplog.

Quick Start

from oplog import configure, op, run, db, export

# Configure once at startup
configure(project="my_project", backend="sqlite:///traces.db")

# Log standalone operations
op("classify") \
    .model("setfit-intent") \
    .input(text="hello world") \
    .output(label="greeting", score=0.95) \
    .save()

# Log grouped operations within a run (with run-level metadata for A/B testing)
with run(strategy="rerank_v2", experiment="exp_042") as r:
    op("retrieve") \
        .model("bge-m3") \
        .input(query="capital of France?", k=10) \
        .output(candidates=["Paris is the capital..."]) \
        .save()

    op("rerank") \
        .model("bge-reranker-base") \
        .input(query="capital of France?", candidates=[...]) \
        .output(ranked=["Paris is the capital..."], scores=[0.94]) \
        .meta(latency_ms=42) \
        .save()
    # Both ops get meta={"strategy": "rerank_v2", "experiment": "exp_042", ...}

# Flag for training
db.flag(run_id=r.id, reason="training", note="clean example")

# Query and export
records = db.query(operation="rerank", flagged_for="training")
export.to_jsonl(records, "training_data.jsonl")

API Reference

Configuration

configure(project="name", backend="sqlite:///traces.db")

Backend formats:

  • SQLite: sqlite:///path/to/traces.db (auto-creates file and parent directories)
  • PostgreSQL: postgresql://user:pass@host:port/dbname

Operations

op("operation_type")        # Start building an operation
    .model("model-name")    # Model identifier
    .input(**kwargs)        # Input data (JSON)
    .output(**kwargs)       # Output data (JSON)
    .meta(**kwargs)         # Metadata (latency, tokens, etc.)
    .tags("tag1", "tag2")   # Categorical tags
    .save()                 # Persist and return operation ID

Runs

with run() as r:            # Auto-generated run ID
    op(...).save()          # seq=0
    op(...).save()          # seq=1
    print(r.id)             # Access run ID

with run("custom-id"):      # Explicit run ID
    ...

# Run-level metadata (propagates to all operations in the run)
with run(strategy="methodA", experiment_id="exp123") as r:
    op("test").save()                      # meta={"strategy": "methodA", "experiment_id": "exp123"}
    op("test").meta(latency_ms=42).save()  # meta includes both run + op metadata

Run metadata is merged with operation metadata. Operation-level values override run-level on conflicts.

Database Operations

# Query
records = db.query(
    operation="rerank",     # Filter by operation type
    model="model-name",     # Filter by model
    run_id="...",           # Filter by run
    flagged_for="training", # Filter by flag
    tags=["tag1", "tag2"],  # Filter by tags (AND logic)
    limit=100,              # Pagination
    offset=0,
)

# Flag
db.flag(ids=[...], reason="training", note="optional note")
db.flag(run_id="...", reason="review")

# Unflag
db.unflag(ids=[...])
db.unflag(run_id="...")

Export

# JSONL
export.to_jsonl(records, "output.jsonl")

# CSV
export.to_csv(records, "output.csv")

# pandas DataFrame
df = export.to_dataframe(records)

# HuggingFace Dataset
dataset = export.to_dataset(records)

# Field selection (dot notation for nested fields)
export.to_jsonl(records, "output.jsonl", fields=["inputs.query", "outputs.score"])

Multi-Tracer Usage

For multiple projects or explicit control:

from oplog import Tracer

tracer = Tracer(project="my_project", backend="sqlite:///traces.db")

tracer.op("classify").input(...).save()

with tracer.run() as r:
    tracer.op("rerank").input(...).save()

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oplogger-0.1.0.tar.gz (14.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oplogger-0.1.0-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file oplogger-0.1.0.tar.gz.

File metadata

  • Download URL: oplogger-0.1.0.tar.gz
  • Upload date:
  • Size: 14.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for oplogger-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fe2da7664c526135f2c5cd1b78dce501d44acf25046c4554cd29ba7c07aee1df
MD5 9dd1680d6c223068c21e021afde03124
BLAKE2b-256 092a2358e2a49e033dca75451b8969ee3af27dfaa1221334ba8a6687b7aa970a

See more details on using hashes here.

File details

Details for the file oplogger-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: oplogger-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for oplogger-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e1f34eb21a6e094cde71c95c2695e9c331c0cb68f86e69c1bd2b46d31291a8bc
MD5 fc53372ae46af0a3609e884b5cdba9f5
BLAKE2b-256 b0dd1f3d54b468ecc1bc4d7c1afdfec8f46937d14218080993b6c45fdf391f0e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page