A Python SDK for Osmosis LLM training workflows: reward/rubric validation and remote rollout.

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

osmosis-ai

A Python SDK for Osmosis LLM training workflows:

Reward/rubric validation helpers with strict type enforcement
Remote Rollout SDK for integrating agent frameworks with Osmosis training

Installation

pip install osmosis-ai

Requires Python 3.10 or newer.

This installs the Osmosis CLI and pulls in litellm (unified LLM interface supporting 100+ providers) along with supporting utilities such as PyYAML, python-dotenv, and requests.

For development:

git clone https://github.com/Osmosis-AI/osmosis-sdk-python
cd osmosis-sdk-python

# Install package in editable mode
pip install -e .

# Install with development dependencies (pytest, formatters, etc.)
pip install -e ".[dev]"

Quick Start

from osmosis_ai import osmosis_reward

@osmosis_reward
def simple_reward(solution_str: str, ground_truth: str, extra_info: dict = None) -> float:
    """Basic exact match reward function."""
    return 1.0 if solution_str.strip() == ground_truth.strip() else 0.0

# Use the reward function
score = simple_reward("hello world", "hello world")  # Returns 1.0

from osmosis_ai import evaluate_rubric

solution = "The capital of France is Paris."

# Export OPENAI_API_KEY in your shell before running this snippet.
rubric_score = evaluate_rubric(
    rubric="Assistant must mention the verified capital city.",
    solution_str=solution,
    model_info={
        "provider": "openai",
        "model": "gpt-5",
        "api_key_env": "OPENAI_API_KEY",
    },
    ground_truth="Paris",
)

print(rubric_score)  # -> 1.0 (full payload available via return_details=True)

Remote Rollout SDK

If you're integrating an agent loop with Osmosis remote rollout / TrainGate, see:

docs/rollout/README.md (quick start)
docs/rollout/architecture.md (protocol + lifecycle)

Remote Rubric Evaluation

evaluate_rubric talks to hosted LLM providers through LiteLLM, a unified interface supporting 100+ providers (OpenAI, Anthropic, Google Gemini, xAI, OpenRouter, Cerebras, Azure, Bedrock, Vertex AI, and more). Every provider returns a strict JSON object with {"score": number, "explanation": string}. The helper clamps the score into your configured range, validates the structure, and exposes the raw payload when return_details=True.

Credentials are resolved from environment variables by default:

OPENAI_API_KEY for OpenAI
ANTHROPIC_API_KEY for Anthropic
GEMINI_API_KEY for Google Gemini
XAI_API_KEY for xAI
OPENROUTER_API_KEY for OpenRouter
CEREBRAS_API_KEY for Cerebras

Override the environment variable name with model_info={"api_key_env": "CUSTOM_ENV_NAME"} when needed, or supply an inline secret with model_info={"api_key": "sk-..."} for ephemeral credentials. Missing API keys raise a MissingAPIKeyError that explains how to export the secret before trying again.

api_key and api_key_env are mutually exclusive ways to provide the same credential. When api_key is present and non-empty it is used directly, skipping any environment lookup. Otherwise the resolver falls back to api_key_env (or the provider default) and pulls the value from your local environment with os.getenv.

model_info accepts additional rubric-specific knobs:

score_min / score_max – change the default [0.0, 1.0] scoring bounds.
system_prompt / original_input – provide optional context strings that will be quoted in the judging prompt.
timeout – customise the provider timeout in seconds.

Pass metadata={...} to evaluate_rubric when you need structured context quoted in the judge prompt, and set return_details=True to receive the full RewardRubricRunResult payload (including the provider’s raw response).

Remote failures surface as ProviderRequestError instances, with ModelNotFoundError reserved for missing model identifiers so you can retry with a new snapshot.

Provider model snapshot names change frequently. Check each vendor's dashboard for the latest identifier if you encounter a "model not found" error.

Provider Architecture

All provider routing is handled by LiteLLM. To use any supported provider, pass its name and model in model_info:

result = evaluate_rubric(
    rubric="...",
    solution_str="...",
    model_info={"provider": "anthropic", "model": "claude-sonnet-4-5-20250929"},
)

Any provider supported by LiteLLM can be used without additional configuration beyond setting the appropriate API key environment variable.

Required Function Signature

All functions decorated with @osmosis_reward must have exactly this signature:

@osmosis_reward
def your_function(solution_str: str, ground_truth: str, extra_info: dict = None) -> float:
    # Your reward logic here
    return float_score

Parameters

solution_str: str - The solution string to evaluate (required)
ground_truth: str - The correct/expected answer (required)
extra_info: dict = None - Optional dictionary for additional configuration

Return Value

-> float - Must return a float value representing the reward score

The decorator will raise a TypeError if the function doesn't match this exact signature or doesn't return a float.

Rubric Function Signature

Rubric functions decorated with @osmosis_rubric must match this signature:

@osmosis_rubric
def your_rubric(solution_str: str, ground_truth: str | None, extra_info: dict) -> float:
    # Your rubric logic here
    return float_score

The runtime forwards None for ground_truth when no reference answer exists. Annotate the parameter as Optional[str] (or handle None explicitly) if your rubric logic expects to run in that scenario.

Required `extra_info` fields

provider – Non-empty string identifying the judge provider.
model – Non-empty string naming the provider model to call.
rubric – Natural-language rubric instructions for the judge model.
api_key / api_key_env – Supply either the raw key or the environment variable name that exposes it.

Optional `extra_info` fields

system_prompt – Optional string prepended to the provider’s base system prompt when invoking the judge; include it inside extra_info rather than as a separate argument.
score_min / score_max – Optional numeric overrides for the expected score range.
model_info_overrides – Optional dict merged into the provider configuration passed to the judge.

Additional keys are passthrough and can be used for custom configuration. If you need to extend the provider payload (for example adding api_key_env), add a dict under model_info_overrides and it will be merged with the required provider/model pair before invoking evaluate_rubric. The decorator enforces the parameter names/annotations, validates the embedded configuration at call time, and ensures the wrapped function returns a float.

Annotation quirk: extra_info must be annotated as dict without a default value, unlike @osmosis_reward.

Tip: When delegating to evaluate_rubric, pass the raw solution_str directly and include any extra context inside the metadata payload.

Examples

See the examples/ directory for complete examples:

@osmosis_reward
def case_insensitive_match(solution_str: str, ground_truth: str, extra_info: dict = None) -> float:
    """Case-insensitive string matching with partial credit."""
    match = solution_str.lower().strip() == ground_truth.lower().strip()

    if extra_info and 'partial_credit' in extra_info:
        if not match and extra_info['partial_credit']:
            len_diff = abs(len(solution_str) - len(ground_truth))
            if len_diff <= 2:
                return 0.5

    return 1.0 if match else 0.0

@osmosis_reward
def numeric_tolerance(solution_str: str, ground_truth: str, extra_info: dict = None) -> float:
    """Numeric comparison with configurable tolerance."""
    try:
        solution_num = float(solution_str.strip())
        truth_num = float(ground_truth.strip())

        tolerance = extra_info.get('tolerance', 0.01) if extra_info else 0.01
        return 1.0 if abs(solution_num - truth_num) <= tolerance else 0.0
    except ValueError:
        return 0.0

examples/rubric_functions.py demonstrates evaluate_rubric with OpenAI, Anthropic, Gemini, xAI, OpenRouter, and Cerebras via LiteLLM's unified interface.
examples/reward_functions.py keeps local reward helpers that showcase the decorator contract without external calls.
examples/rubric_configs.yaml bundles two rubric definitions with provider configuration and scoring bounds.
examples/sample_data.jsonl contains two rubric-aligned solution strings so you can trial dataset validation.

# examples/rubric_configs.yaml (excerpt)
version: 1
rubrics:
  - id: support_followup
    model_info:
      provider: openai
      model: gpt-5-mini
      api_key_env: OPENAI_API_KEY

{"conversation_id": "ticket-001", "rubric_id": "support_followup", "original_input": "...", "solution_str": "..."}
{"conversation_id": "ticket-047", "rubric_id": "policy_grounding", "original_input": "...", "solution_str": "..."}

CLI Tools

Installing the SDK also provides a lightweight CLI available as osmosis (aliases: osmosis_ai, osmosis-ai).

Authentication

# Log in to Osmosis AI (opens browser for authentication)
osmosis login

# Force re-login, clearing existing credentials
osmosis login --force

# Print the authentication URL without opening browser
osmosis login --no-browser

# Show current user and all workspaces
osmosis whoami

# Logout (interactive workspace selection)
osmosis logout

# Logout from all workspaces
osmosis logout --all

# Skip confirmation prompt
osmosis logout -y

Credentials are saved to ~/.config/osmosis/credentials.json and include workspace information and token expiration.

Workspace Management

Manage multiple workspaces after logging in:

# List all logged-in workspaces
osmosis workspace list

# Show the current active workspace
osmosis workspace current

# Switch to a different workspace
osmosis workspace switch <workspace-name>

You can log in to multiple workspaces and switch between them. Each workspace maintains its own credentials and role information.

Remote Rollout Server

Start a RolloutServer for an agent loop implementation:

# Validate agent loop before starting (checks tools, async run, etc.)
osmosis validate -m my_agent:agent_loop

# Start server with Platform registration (requires `osmosis login`)
osmosis login
osmosis serve -m my_agent:agent_loop

# Specify port
osmosis serve -m my_agent:agent_loop -p 8080

# Local / container mode: skip Platform registration (no login required).
# NOTE: API key auth is still enabled by default.
osmosis serve -m my_agent:agent_loop --skip-register

# Local debug mode: disable API key auth AND skip Platform registration
osmosis serve -m my_agent:agent_loop --local

# Provide a stable API key (otherwise one is generated and printed on startup)
osmosis serve -m my_agent:agent_loop --skip-register --api-key "$MY_API_KEY"

# Skip validation (not recommended)
osmosis serve -m my_agent:agent_loop --no-validate

The module path format is module:attribute, e.g., server:agent_loop or mypackage.agents:MyAgentClass.

Note: The --api-key option sets the API key for this RolloutServer. It is used by TrainGate to authenticate its requests to your server. This key is not the same as your osmosis login token (which is for authenticating with the Osmosis Platform), nor is it used for callbacks from your server back to TrainGate.

Rubric Tools

Preview a rubric file and print every configuration discovered, including nested entries:

osmosis preview --path path/to/rubric.yaml

Preview a dataset of rubric-scored solutions stored as JSONL:

osmosis preview --path path/to/data.jsonl

Evaluate a dataset against a hosted rubric configuration and print the returned scores:

osmosis eval-rubric --rubric support_followup --data examples/sample_data.jsonl

Command split (development-stage breaking change):
- osmosis eval-rubric evaluates JSONL conversations against hosted rubrics.
- osmosis eval runs rollout eval functions against RolloutAgentLoop datasets.
Supply the dataset with -d/--data path/to/data.jsonl; the path is resolved relative to the current working directory.
Use --config path/to/rubric_configs.yaml when the rubric definitions are not located alongside the dataset.
Pass -n/--number to sample the provider multiple times per record; the CLI prints every run along with aggregate statistics (average, variance, standard deviation, and min/max).
Provide --output path/to/dir to create the directory (if needed) and emit rubric_eval_result_<unix_timestamp>.json, or supply a full file path (any extension) to control the filename; each file captures every run, provider payloads, timestamps, and aggregate statistics for downstream analysis.
Skip --output to collect results under ~/.cache/osmosis/eval_result/<rubric_id>/rubric_eval_result_<identifier>.json; the CLI writes this JSON whether the evaluation finishes cleanly or hits provider/runtime errors so you can inspect failures later (only a manual Ctrl+C interrupt leaves no file behind).
Dataset rows whose rubric_id does not match the requested rubric are skipped automatically.
Each dataset record must provide a non-empty solution_str; optional fields such as original_input, ground_truth, and extra_info travel with the record and are forwarded to the evaluator when present.
When delegating to a custom @osmosis_rubric function, the CLI enriches extra_info with the active provider, model, rubric, score bounds, any configured system_prompt, the resolved original_input, and the record’s metadata/extra fields so the decorator’s required entries are always present.
Rubric configuration files intentionally reject extra_info; provide per-example context through the dataset instead.

Both commands validate the file, echo a short summary (Loaded <n> ...), and pretty-print the parsed records so you can confirm that new rubrics or test fixtures look correct before committing them. Invalid files raise a descriptive error and exit with a non-zero status code.

Running Examples

PYTHONPATH=. python examples/reward_functions.py
PYTHONPATH=. python examples/rubric_functions.py  # Uncomment the provider you need before running

Testing

Run python -m pytest (or any subset under tests/) to exercise the updated helpers:

tests/test_rubric_eval.py covers prompt construction for solution_str evaluations.
tests/test_cli_services.py validates dataset parsing, extra-info enrichment, and engine interactions.
tests/test_cli.py ensures the CLI pathways surface the new fields end to end.

Add additional tests under tests/ as you extend the library.

License

MIT License - see LICENSE file for details.

Contributing

Fork the repository
Create a feature branch
Make your changes
Run tests and examples
Submit a pull request

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.2.19

Apr 30, 2026

0.2.18

Apr 21, 2026

0.2.17

Feb 18, 2026

0.2.16

Feb 18, 2026

0.2.15

Feb 14, 2026

This version

0.2.14

Feb 13, 2026

0.2.13

Feb 12, 2026

0.2.12

Feb 6, 2026

0.2.11

Jan 13, 2026

0.2.10

Jan 12, 2026

0.2.9

Jan 10, 2026

0.2.8

Jan 1, 2026

0.2.7

Dec 18, 2025

0.2.6

Nov 11, 2025

0.2.5

Nov 7, 2025

0.2.4

Nov 3, 2025

0.2.3

Oct 27, 2025

0.2.2

Oct 14, 2025

0.2.1

Sep 26, 2025

0.2.0

Sep 22, 2025

0.1.9

Sep 22, 2025

0.1.8

Aug 28, 2025

0.1.7

Mar 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

osmosis_ai-0.2.14.tar.gz (158.3 kB view details)

Uploaded Feb 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

osmosis_ai-0.2.14-py3-none-any.whl (148.5 kB view details)

Uploaded Feb 13, 2026 Python 3

File details

Details for the file osmosis_ai-0.2.14.tar.gz.

File metadata

Download URL: osmosis_ai-0.2.14.tar.gz
Upload date: Feb 13, 2026
Size: 158.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for osmosis_ai-0.2.14.tar.gz
Algorithm	Hash digest
SHA256	`85dd58b50cd889c4efa8c5e7f94aa36103729affc14926965c7ed88a1b1c577a`
MD5	`0702e711e17eded9fb5b1c9fc24dafb0`
BLAKE2b-256	`ec9cac65267f96e61b476122fdc1d6f3d77aa585f1910f696f72e3cd96b8d1d5`

See more details on using hashes here.

File details

Details for the file osmosis_ai-0.2.14-py3-none-any.whl.

File metadata

Download URL: osmosis_ai-0.2.14-py3-none-any.whl
Upload date: Feb 13, 2026
Size: 148.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for osmosis_ai-0.2.14-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a686e4396dcfada7b2fd90c15ef3df1e92dac253169368666aae4535ee188f3e`
MD5	`3f6533d9009ea86c12a45152a3b5714f`
BLAKE2b-256	`5e44857131a41617e2a6a19cd1dfbe1e95e596c741191cceb9858fa055c06669`

See more details on using hashes here.

osmosis-ai 0.2.14

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

osmosis-ai

Installation

Quick Start

Remote Rollout SDK

Remote Rubric Evaluation

Provider Architecture

Required Function Signature

Parameters

Return Value

Rubric Function Signature

Required extra_info fields

Optional extra_info fields

Examples

CLI Tools

Authentication

Workspace Management

Remote Rollout Server

Rubric Tools

Running Examples

Testing

License

Contributing

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Required `extra_info` fields

Optional `extra_info` fields