Skip to main content

Tracing, evaluation, and training utilities for LLM applications.

Project description

freesolo

freesolo is the Python SDK used by Freesolo-generated training repos.

The SDK gives generated repos one shared surface for:

  • loading the approved training contract
  • loading datasets and building training conversations
  • defining the repo-specific task environment
  • running contract-aligned evaluations
  • running GEPA prompt optimization
  • launching SFT and GRPO training
  • optionally exporting OpenTelemetry traces

The main idea is that a generated repo should contain only the task-specific files under freesolo/, while the reusable training, evaluation, dataset, contract, and tracing behavior comes from this package.

Install

pip install freesolo

From a checkout:

cd freesolo-sdk
export PYTHONPATH="$PWD/pypi"

Credentials

Most workflows that upload results or start hosted work need a Freesolo API key:

export FREESOLO_API_KEY=fslo_...

Optional environment variables:

  • FREESOLO_BASE_URL: defaults to https://api.freesolo.co
  • FREESOLO_DEPLOYMENT_URL: Modal deployment app for Tinker checkpoint uploads
  • FREESOLO_HOSTING_URL: Modal hosting app for hosted LoRA inference
  • OPENROUTER_API_KEY: oracle record generation
  • TINKER_API_KEY: SFT and GRPO training
  • WANDB_API_KEY: experiment tracking when enabled by the generated repo

Generated Repo Flow

The SDK is built around the files that Freesolo agents generate in a target repo:

freesolo/TRAINING_CONTRACT.md
freesolo/config.py
freesolo/environment.py
freesolo/data.py
freesolo/eval.py
freesolo/gepa.py
freesolo/training.py

A normal generated repo flow is:

  1. Write or approve freesolo/TRAINING_CONTRACT.md.
  2. Define the task once in freesolo/environment.py.
  3. Run evals against candidate model outputs with the same environment and contract.
  4. Use the same environment for GEPA, SFT, and GRPO.
  5. Add tracing only when you need observability for app or SDK spans.

Tracing is not the center of the SDK. It is optional instrumentation around the contract/eval/training loop.

Deployment And Hosting

Tinker checkpoints can be queued for LoRA adapter upload, then tested through the hosting app by adapter id:

from freesolo.utils.hosting import HostedLoraClient
from freesolo.utils.upload import deploy_tinker_lora_adapter, wait_for_deployment

job = deploy_tinker_lora_adapter(
    "tinker://<run_id>/sampler_weights/final",
    base_model="openai/gpt-oss-20b",
    adapter_id="people-search",
)
completed = wait_for_deployment(job["jobId"])

client = HostedLoraClient()
result = client.generate(completed["adapterId"], prompt="Find senior search engineers in SF")
print(result["text"])

Environment

Environment is the task adapter. It defines how examples become model prompts and how model responses are scored.

from freesolo.datasets import TaskExample
from freesolo.environments import Environment, RewardResult


class RepoEnvironment(Environment):
    def build_prompt_messages(self, example: TaskExample, prompt_text: str):
        return [
            {"role": "system", "content": prompt_text},
            {"role": "user", "content": example.task},
        ]

    def score_response(self, example: TaskExample, response_text: str) -> RewardResult:
        expected = str(example.expected_output or "").strip()
        actual = response_text.strip()
        passed = actual == expected
        return RewardResult(
            name="exact_match",
            score=1.0 if passed else 0.0,
            success=passed,
            threshold=1.0,
            reason="matched expected output" if passed else "mismatch",
            return_type="binary",
        )


def load_environment(**_: object) -> Environment:
    return RepoEnvironment()

Generated repo helpers should pass this reference through SDK APIs:

ENVIRONMENT_REFERENCE = "freesolo/environment.py:load_environment"

That keeps evals, GEPA, SFT, and GRPO aligned on one prompt and reward definition.

Evaluations

Environment evals run model outputs through the contract and environment reward logic, then upload the result to Freesolo.

from openai import OpenAI

from freesolo.datasets import TaskExample
from freesolo.environments import EnvironmentGeneration
from freesolo.evaluation import EvaluationClient

from config import CONTRACT_PATH, ENVIRONMENT_REFERENCE


client = OpenAI()


def generate(messages: list[dict[str, str]], example: TaskExample):
    response = client.chat.completions.create(
        model="gpt-4.1-mini",
        messages=messages,
    )
    return EnvironmentGeneration(
        response_text=response.choices[0].message.content or "",
        total_tokens=response.usage.total_tokens if response.usage else None,
    )


results = EvaluationClient().run_environment(
    name="dev-eval",
    source="runs/eval/dev.jsonl",
    contract_path=CONTRACT_PATH,
    environment=ENVIRONMENT_REFERENCE,
    generate=generate,
)

For smaller scripts and CI checks, custom scorers are also supported:

from typing import Any

from freesolo.evaluation import BinaryResponse, CustomScorer, EvaluationClient


class NoEmptyAnswer(CustomScorer[BinaryResponse]):
    async def score(self, row: dict[str, Any]) -> BinaryResponse:
        ok = bool(str(row.get("actual_output", "")).strip())
        return BinaryResponse(value=ok, reason="actual_output is non-empty")


results = EvaluationClient().run(
    name="non-empty-answer",
    data=[{"actual_output": "hello"}],
    scorers=[NoEmptyAnswer()],
)

GEPA And Training

GEPA, SFT, and GRPO use the same contract, datasets, and environment adapter as evals. Generated repos should call the SDK helpers rather than copying trainer or optimizer internals. SDK training is pinned to Qwen/Qwen3.6-35B-A3B; generated repos should not expose or pass a base-model setting. SFT defaults to LoRA rank 64 for that model; generated repos should omit lora_rank unless they are intentionally running a controlled adapter-size experiment.

from freesolo.training import train_grpo, train_sft

from config import (
    CONTRACT_PATH,
    ENVIRONMENT_REFERENCE,
    GRPO_DATASET_PATH,
    GRPO_LOG_DIR,
    SFT_CONFIG,
    SFT_DATASET_PATH,
    SFT_LOG_DIR,
)


def run_sft() -> int:
    return train_sft(
        contract_path=CONTRACT_PATH,
        dataset_path=SFT_DATASET_PATH,
        environment=ENVIRONMENT_REFERENCE,
        log_dir=SFT_LOG_DIR,
        sft_config=SFT_CONFIG,
    )


def run_grpo() -> int:
    return train_grpo(
        contract_path=CONTRACT_PATH,
        dataset_path=GRPO_DATASET_PATH,
        environment=ENVIRONMENT_REFERENCE,
        log_dir=GRPO_LOG_DIR,
        sft_log_dir=SFT_LOG_DIR,
    )

Tracing

Tracing is available for applications or generated repo commands that need span export. Configure it at process startup, then use normal OpenTelemetry spans.

from freesolo.tracing import configure_tracer, force_flush, get_tracer

configure_tracer(project_name="my-training-repo")
tracer = get_tracer()

with tracer.start_as_current_span("eval.batch") as span:
    span.set_attribute("freesolo.dataset", "runs/eval/dev.jsonl")

force_flush()

Runnable Examples

Copy-pasteable examples live in examples/:

  • environment.py: task environment used by evals, training, and GEPA.
  • support_dataset.py: shared dataset and prompt paths for evals, SFT, GRPO, and GEPA.
  • evaluation_from_files.py: run an environment eval from concrete files.
  • evaluation_custom_scorer.py: run local custom scorers.
  • gepa_prompt_example.py: run the Freesolo GEPA adapter.
  • training_sft_grpo.py: start SFT or GRPO training from package APIs.
  • tracing_manual_span.py: send one OpenTelemetry span.

Example:

uv run python examples/evaluation_custom_scorer.py --local

Public API

The root freesolo module intentionally exports no functions. Import from the subpackages below; lower-level modules may be importable, but they are implementation helpers unless they appear here or in an example.

Import Use case
freesolo.contracts.load_contract_text, extract_contract_spec, load_contract_spec, build_oracle_messages Read contract markdown and build oracle prompt messages.
freesolo.datasets.TaskExample, Dataset, load_dataset Load task examples and construct labeled conversations for evals or training.
freesolo.environments.Environment, RewardResult, RewardMetric, EnvironmentGeneration Define task prompt and reward behavior once for evals, GEPA, SFT, and GRPO.
freesolo.evaluation.EvaluationClient Run custom-scorer evals or environment evals and upload results to Freesolo.
freesolo.evaluation.run_local_evaluation Run custom scorers locally without uploading results.
freesolo.evaluation.CustomScorer, BinaryResponse, NumericResponse Define local scorer logic for eval rows.
freesolo.gepa.GEPASetup, GEPAConfig, DefaultReflectionAgent, attach_gepa, optimize_gepa Optimize prompts through the GEPA adapter using the same environment and dataset abstractions.
freesolo.training.SftConfig, GrpoConfig, TrainGrpoOptions, train_sft, train_grpo Start SFT or GRPO training from package APIs.
freesolo.tracing.configure_tracer, get_tracer, force_flush, shutdown Export OpenTelemetry traces when observability is needed.
freesolo.utils.oracle.generate_ground_truth_records Generate ground-truth JSONL records from source examples using a contract, environment, oracle model, and explicit max_tokens.
freesolo.utils.upload.upload_tinker_checkpoint_to_huggingface Upload a Tinker checkpoint to a private Hugging Face model repo.

Package Docs

The generated-repo-facing package notes live next to the modules:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

freesolo-0.2.22.tar.gz (304.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

freesolo-0.2.22-py3-none-any.whl (78.3 kB view details)

Uploaded Python 3

File details

Details for the file freesolo-0.2.22.tar.gz.

File metadata

  • Download URL: freesolo-0.2.22.tar.gz
  • Upload date:
  • Size: 304.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for freesolo-0.2.22.tar.gz
Algorithm Hash digest
SHA256 3db99507ee16ce7e48ede99eee372c2c896fe370e8069449ed0e17eae069e7ab
MD5 e46693860a0ebac5ff55e7cdeb492a86
BLAKE2b-256 8540f4d56080a6a25bffe1e3c81bd1a198f3e7464c80cc41b72172fe91f629a2

See more details on using hashes here.

File details

Details for the file freesolo-0.2.22-py3-none-any.whl.

File metadata

  • Download URL: freesolo-0.2.22-py3-none-any.whl
  • Upload date:
  • Size: 78.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for freesolo-0.2.22-py3-none-any.whl
Algorithm Hash digest
SHA256 23a35980046faa5b6c2030fe3481c25712af81a787f862adbd3a831f87ec635a
MD5 e5f844c78f140e5024bcda40975b922b
BLAKE2b-256 9295aff8cbaec0d8cd4c7556846ac8761881a15a75d9a9b6f16c5e01c6b05625

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page