A Python library for reward function validation with strict type enforcement.

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

osmosis-ai

A Python library that provides reward and rubric validation helpers for LLM applications with strict type enforcement.

Installation

pip install osmosis-ai

Requires Python 3.9 or newer.

This installs the Osmosis CLI and pulls in the required provider SDKs (openai, anthropic, google-genai, xai-sdk) along with supporting utilities such as PyYAML, python-dotenv, requests, and xxhash.

For development:

git clone https://github.com/Osmosis-AI/osmosis-sdk-python
cd osmosis-sdk-python
pip install -e .

Quick Start

from osmosis_ai import osmosis_reward

@osmosis_reward
def simple_reward(solution_str: str, ground_truth: str, extra_info: dict = None) -> float:
    """Basic exact match reward function."""
    return 1.0 if solution_str.strip() == ground_truth.strip() else 0.0

# Use the reward function
score = simple_reward("hello world", "hello world")  # Returns 1.0

from osmosis_ai import evaluate_rubric

messages = [
    {
        "type": "message",
        "role": "user",
        "content": [{"type": "input_text", "text": "What is the capital of France?"}],
    },
    {
        "type": "message",
        "role": "assistant",
        "content": [{"type": "output_text", "text": "The capital of France is Paris."}],
    },
]

# Export OPENAI_API_KEY in your shell before running this snippet.
rubric_score = evaluate_rubric(
    rubric="Assistant must mention the verified capital city.",
    messages=messages,
    model_info={
        "provider": "openai",
        "model": "gpt-5",
        "api_key_env": "OPENAI_API_KEY",
    },
    ground_truth="Paris",
)

print(rubric_score)  # -> 1.0 (full payload available via return_details=True)

Remote Rubric Evaluation

evaluate_rubric talks to each provider through its official Python SDK while enforcing the same JSON schema everywhere:

OpenAI / xAI – Uses OpenAI(...).responses.create (or chat.completions.create) with response_format={"type": "json_schema"} and falls back to json_object when needed.
Anthropic – Forces a tool call with a JSON schema via Anthropic(...).messages.create, extracting the returned tool arguments.
Google Gemini – Invokes google.genai.Client(...).models.generate_content with response_mime_type="application/json" and response_schema.

Every provider therefore returns a strict JSON object with {"score": number, "explanation": string}. The helper clamps the score into your configured range, validates the structure, and exposes the raw payload when return_details=True.

Credentials are resolved from environment variables by default:

OPENAI_API_KEY for OpenAI
ANTHROPIC_API_KEY for Anthropic
GOOGLE_API_KEY for Google Gemini
XAI_API_KEY for xAI

Override the environment variable name with model_info={"api_key_env": "CUSTOM_ENV_NAME"} when needed, or supply an inline secret with model_info={"api_key": "sk-..."} for ephemeral credentials. Missing API keys raise a MissingAPIKeyError that explains how to export the secret before trying again.

model_info accepts additional rubric-specific knobs:

score_min / score_max – change the default [0.0, 1.0] scoring bounds.
system_prompt / original_input – override the helper’s transcript inference when those entries are absent.
timeout – customise the provider timeout in seconds.

Pass extra_info={...} to evaluate_rubric when you need structured context quoted in the judge prompt, and set return_details=True to receive the full RewardRubricRunResult payload (including the provider’s raw response).

Remote failures surface as ProviderRequestError instances, with ModelNotFoundError reserved for missing model identifiers so you can retry with a new snapshot.

Older SDK versions that lack schema parameters automatically fall back to instruction-only JSON; the helper still validates the response payload before returning. Provider model snapshot names change frequently. Check each vendor's dashboard for the latest identifier if you encounter a “model not found” error.

Provider Architecture

All remote integrations live in osmosis_ai/providers/ and implement the RubricProvider interface. At import time the default registry registers OpenAI, xAI, Anthropic, and Google Gemini so evaluate_rubric can route requests without additional configuration. The request/response plumbing is encapsulated in each provider module, keeping evaluate_rubric focused on prompt construction, payload validation, and credential resolution.

Add your own provider by subclassing RubricProvider, implementing run() with the vendor SDK, and calling register_provider() during start-up. A step-by-step guide is available in osmosis_ai/providers/README.md.

Required Function Signature

All functions decorated with @osmosis_reward must have exactly this signature:

@osmosis_reward
def your_function(solution_str: str, ground_truth: str, extra_info: dict = None) -> float:
    # Your reward logic here
    return float_score

Parameters

solution_str: str - The solution string to evaluate (required)
ground_truth: str - The correct/expected answer (required)
extra_info: dict = None - Optional dictionary for additional configuration

Return Value

-> float - Must return a float value representing the reward score

The decorator will raise a TypeError if the function doesn't match this exact signature or doesn't return a float.

Rubric Function Signature

Rubric functions decorated with @osmosis_rubric must accept the parameters:

model_info: dict
rubric: str
messages: list
ground_truth: Optional[str] = None
system_message: Optional[str] = None
extra_info: dict = None
score_min: float = 0.0 (optional lower bound; must default to 0.0 and stay below score_max)
score_max: float = 1.0 (optional upper bound; must default to 1.0 and stay above score_min)

and must return a float. The decorator validates the signature and runtime payload (including message role validation and return type) before delegating to your custom logic.

Required fields: model_info must contain non-empty provider and model string entries.

Annotation quirk: extra_info must be annotated as a plain dict with a default of None to satisfy the validator.

Tip: You can call evaluate_rubric from inside a rubric function (or any other orchestrator) to outsource judging to a hosted model while still benefiting from the decorator’s validation.

Examples

See the examples/ directory for complete examples:

@osmosis_reward
def case_insensitive_match(solution_str: str, ground_truth: str, extra_info: dict = None) -> float:
    """Case-insensitive string matching with partial credit."""
    match = solution_str.lower().strip() == ground_truth.lower().strip()

    if extra_info and 'partial_credit' in extra_info:
        if not match and extra_info['partial_credit']:
            len_diff = abs(len(solution_str) - len(ground_truth))
            if len_diff <= 2:
                return 0.5

    return 1.0 if match else 0.0

@osmosis_reward
def numeric_tolerance(solution_str: str, ground_truth: str, extra_info: dict = None) -> float:
    """Numeric comparison with configurable tolerance."""
    try:
        solution_num = float(solution_str.strip())
        truth_num = float(ground_truth.strip())

        tolerance = extra_info.get('tolerance', 0.01) if extra_info else 0.01
        return 1.0 if abs(solution_num - truth_num) <= tolerance else 0.0
    except ValueError:
        return 0.0

examples/rubric_functions.py demonstrates evaluate_rubric with OpenAI, Anthropic, Gemini, and xAI using the schema-enforced SDK integrations.
examples/reward_functions.py keeps local reward helpers that showcase the decorator contract without external calls.
examples/rubric_configs.yaml bundles two rubric definitions, each with its own provider configuration and extra prompt context.
examples/sample_data.jsonl contains two conversation payloads mapped to those rubrics so you can trial dataset validation.

# examples/rubric_configs.yaml (excerpt)
version: 1
rubrics:
  - id: support_followup
    model_info:
      provider: openai
      model: gpt-5-mini
      api_key_env: OPENAI_API_KEY

{"conversation_id": "ticket-001", "rubric_id": "support_followup", "...": "..."}
{"conversation_id": "ticket-047", "rubric_id": "policy_grounding", "...": "..."}

CLI Tools

Installing the SDK also provides a lightweight CLI available as osmosis (aliases: osmosis_ai, osmosis-ai) for inspecting rubric YAML files and JSONL test payloads.

Preview a rubric file and print every configuration discovered, including nested entries:

osmosis preview --path path/to/rubric.yaml

Preview a dataset of chat transcripts stored as JSONL:

osmosis preview --path path/to/data.jsonl

Evaluate a dataset against a hosted rubric configuration and print the returned scores:

osmosis eval --rubric support_followup --data examples/sample_data.jsonl

Supply the dataset with -d/--data path/to/data.jsonl; the path is resolved relative to the current working directory.
Use --config path/to/rubric_configs.yaml when the rubric definitions are not located alongside the dataset.
Pass -n/--number to sample the provider multiple times per record; the CLI prints every run along with aggregate statistics (average, variance, standard deviation, and min/max).
Provide --output path/to/dir to create the directory (if needed) and emit rubric_eval_result_<unix_timestamp>.json, or supply a full file path (any extension) to control the filename; each file captures every run, provider payloads, timestamps, and aggregate statistics for downstream analysis.
Skip --output to collect results under ~/.cache/osmosis/eval_result/<rubric_id>/rubric_eval_result_<identifier>.json; the CLI writes this JSON whether the evaluation finishes cleanly or hits provider/runtime errors so you can inspect failures later (only a manual Ctrl+C interrupt leaves no file behind).
Dataset rows whose rubric_id does not match the requested rubric are skipped automatically.

Both commands validate the file, echo a short summary (Loaded <n> ...), and pretty-print the parsed records so you can confirm that new rubrics or test fixtures look correct before committing them. Invalid files raise a descriptive error and exit with a non-zero status code.

Running Examples

PYTHONPATH=. python examples/reward_functions.py
PYTHONPATH=. python examples/rubric_functions.py  # Uncomment the provider you need before running

Testing

Run python -m pytest tests/test_rubric_eval.py to exercise the guards that ensure rubric prompts ignore message metadata (for example tests/test_rubric_eval.py::test_collect_text_skips_metadata_fields) while still preserving nested tool output. Add additional tests under tests/ as you extend the library.

License

MIT License - see LICENSE file for details.

Contributing

Fork the repository
Create a feature branch
Make your changes
Run tests and examples
Submit a pull request

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.2.19

Apr 30, 2026

0.2.18

Apr 21, 2026

0.2.17

Feb 18, 2026

0.2.16

Feb 18, 2026

0.2.15

Feb 14, 2026

0.2.14

Feb 13, 2026

0.2.13

Feb 12, 2026

0.2.12

Feb 6, 2026

0.2.11

Jan 13, 2026

0.2.10

Jan 12, 2026

0.2.9

Jan 10, 2026

0.2.8

Jan 1, 2026

0.2.7

Dec 18, 2025

0.2.6

Nov 11, 2025

0.2.5

Nov 7, 2025

0.2.4

Nov 3, 2025

This version

0.2.3

Oct 27, 2025

0.2.2

Oct 14, 2025

0.2.1

Sep 26, 2025

0.2.0

Sep 22, 2025

0.1.9

Sep 22, 2025

0.1.8

Aug 28, 2025

0.1.7

Mar 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

osmosis_ai-0.2.3.tar.gz (44.9 kB view details)

Uploaded Oct 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

osmosis_ai-0.2.3-py3-none-any.whl (50.3 kB view details)

Uploaded Oct 27, 2025 Python 3

File details

Details for the file osmosis_ai-0.2.3.tar.gz.

File metadata

Download URL: osmosis_ai-0.2.3.tar.gz
Upload date: Oct 27, 2025
Size: 44.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for osmosis_ai-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`57f480297ec81840b52cbdba7273cf597df3e6f436744adde151feba6f76722f`
MD5	`ba7ada5cf5064ef4cd412cdb14c97eb9`
BLAKE2b-256	`70a3b89ef5bc2ca26b3138a2e40aea4fb0f9db7bc0af2072603dcf92b2328ce9`

See more details on using hashes here.

File details

Details for the file osmosis_ai-0.2.3-py3-none-any.whl.

File metadata

Download URL: osmosis_ai-0.2.3-py3-none-any.whl
Upload date: Oct 27, 2025
Size: 50.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for osmosis_ai-0.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f7cb3a4fe23fa99bb3af01290afc524bda300a704476ba76abf72ef45d7d99a6`
MD5	`4e951685e53f4f93fc1a25cf98a9df65`
BLAKE2b-256	`137561587179135065d6cd2530b811d10113531b632e9eceb8da5f6f46f4f8ac`

See more details on using hashes here.

osmosis-ai 0.2.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

osmosis-ai

Installation

Quick Start

Remote Rubric Evaluation

Provider Architecture

Required Function Signature

Parameters

Return Value

Rubric Function Signature

Examples

CLI Tools

Running Examples

Testing

License

Contributing

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes