Tracing, evaluation, and training utilities for LLM applications.
Project description
freesolo
freesolo is the Python SDK used by Freesolo-generated training repos.
The SDK gives generated repos one shared surface for:
- loading the approved training contract
- loading datasets and building training conversations
- defining the repo-specific task environment
- running contract-aligned evaluations
- running GEPA prompt optimization
- launching SFT and GRPO training
- optionally exporting OpenTelemetry traces
The main idea is that a generated repo should contain only the task-specific
files under freesolo/, while the reusable training, evaluation, dataset,
contract, and tracing behavior comes from this package.
Install
pip install freesolo
From a checkout:
cd freesolo-sdk
export PYTHONPATH="$PWD/pypi"
Credentials
Most workflows that upload results or start hosted work need a Freesolo API key:
export FREESOLO_API_KEY=fslo_...
Optional environment variables:
FREESOLO_BASE_URL: defaults tohttps://api.freesolo.coOPENROUTER_API_KEY: hosted LLM-as-judge scorersTINKER_API_KEY: SFT and GRPO trainingWANDB_API_KEY: experiment tracking when enabled by the generated repo
Generated Repo Flow
The SDK is built around the files that Freesolo agents generate in a target repo:
freesolo/TRAINING_CONTRACT.md
freesolo/config.py
freesolo/environment.py
freesolo/data.py
freesolo/eval.py
freesolo/gepa.py
freesolo/training.py
A normal generated repo flow is:
- Write or approve
freesolo/TRAINING_CONTRACT.md. - Define the task once in
freesolo/environment.py. - Run evals against candidate model outputs with the same environment and contract.
- Use the same environment for GEPA, SFT, and GRPO.
- Add tracing only when you need observability for app or SDK spans.
Tracing is not the center of the SDK. It is optional instrumentation around the contract/eval/training loop.
Environment
Environment is the task adapter. It defines how examples become model prompts
and how model responses are scored.
from freesolo.datasets import TaskExample
from freesolo.environments import Environment, RewardResult
class RepoEnvironment(Environment):
def build_prompt_messages(self, example: TaskExample, prompt_text: str):
return [
{"role": "system", "content": prompt_text},
{"role": "user", "content": example.task},
]
def score_response(self, example: TaskExample, response_text: str) -> RewardResult:
expected = str(example.expected_output or "").strip()
actual = response_text.strip()
passed = actual == expected
return RewardResult(
name="exact_match",
score=1.0 if passed else 0.0,
success=passed,
threshold=1.0,
reason="matched expected output" if passed else "mismatch",
return_type="binary",
)
def load_environment(**_: object) -> Environment:
return RepoEnvironment()
Generated repo helpers should pass this reference through SDK APIs:
ENVIRONMENT_REFERENCE = "freesolo/environment.py:load_environment"
That keeps evals, GEPA, SFT, and GRPO aligned on one prompt and reward definition.
Evaluations
Environment evals run model outputs through the contract and environment reward logic, then upload the result to Freesolo.
from openai import OpenAI
from freesolo.datasets import TaskExample
from freesolo.environments import EnvironmentGeneration
from freesolo.evaluation import EvaluationClient
from config import CONTRACT_PATH, ENVIRONMENT_REFERENCE
client = OpenAI()
def generate(messages: list[dict[str, str]], example: TaskExample):
response = client.chat.completions.create(
model="gpt-4.1-mini",
messages=messages,
)
return EnvironmentGeneration(
response_text=response.choices[0].message.content or "",
total_tokens=response.usage.total_tokens if response.usage else None,
)
results = EvaluationClient().run_environment(
name="dev-eval",
source="runs/eval/dev.jsonl",
contract_path=CONTRACT_PATH,
environment=ENVIRONMENT_REFERENCE,
generate=generate,
)
For smaller scripts and CI checks, custom scorers are also supported:
from typing import Any
from freesolo.evaluation import BinaryResponse, CustomScorer, EvaluationClient
class NoEmptyAnswer(CustomScorer[BinaryResponse]):
async def score(self, row: dict[str, Any]) -> BinaryResponse:
ok = bool(str(row.get("actual_output", "")).strip())
return BinaryResponse(value=ok, reason="actual_output is non-empty")
results = EvaluationClient().run(
name="non-empty-answer",
data=[{"actual_output": "hello"}],
scorers=[NoEmptyAnswer()],
)
GEPA And Training
GEPA, SFT, and GRPO use the same contract, datasets, and environment adapter as evals. Generated repos should call the SDK helpers rather than copying trainer or optimizer internals.
from freesolo.training import train_grpo, train_sft
from config import (
BASE_MODEL,
CONTRACT_PATH,
ENVIRONMENT_REFERENCE,
GRPO_DATASET_PATH,
GRPO_LOG_DIR,
SFT_CONFIG,
SFT_DATASET_PATH,
SFT_LOG_DIR,
)
def run_sft() -> int:
return train_sft(
contract_path=CONTRACT_PATH,
dataset_path=SFT_DATASET_PATH,
environment=ENVIRONMENT_REFERENCE,
log_dir=SFT_LOG_DIR,
base_model=BASE_MODEL,
sft_config=SFT_CONFIG,
)
def run_grpo() -> int:
return train_grpo(
contract_path=CONTRACT_PATH,
dataset_path=GRPO_DATASET_PATH,
environment=ENVIRONMENT_REFERENCE,
log_dir=GRPO_LOG_DIR,
sft_log_dir=SFT_LOG_DIR,
base_model=BASE_MODEL,
)
Tracing
Tracing is available for applications or generated repo commands that need span export. Configure it at process startup, then use normal OpenTelemetry spans.
from freesolo.tracing import configure_tracer, force_flush, get_tracer
configure_tracer(service_name="my-training-repo")
tracer = get_tracer()
with tracer.start_as_current_span("eval.batch") as span:
span.set_attribute("freesolo.dataset", "runs/eval/dev.jsonl")
force_flush()
Runnable Examples
Copy-pasteable examples live in examples/:
environment.py: task environment used by evals, training, and GEPA.support_dataset.py: dataset loading helpers for evals, SFT, GRPO, and GEPA.evaluation_from_files.py: run an environment eval from concrete files.evaluation_custom_scorer.py: run local custom scorers.gepa_prompt_example.py: run the Freesolo GEPA adapter.training_sft_grpo.py: start SFT or GRPO training from package APIs.tracing_manual_span.py: send one OpenTelemetry span.
Example:
uv run python examples/evaluation_custom_scorer.py --local
Public API
The root freesolo module intentionally exports no functions. Import from the
subpackages below; lower-level modules may be importable, but they are
implementation helpers unless they appear here or in an example.
| Import | Use case |
|---|---|
freesolo.contracts.load_contract_text, extract_contract_spec, load_contract_spec, build_oracle_messages |
Read contract markdown and build oracle prompt messages. |
freesolo.datasets.TaskExample, Dataset, load_dataset |
Load task examples and construct labeled conversations for evals or training. |
freesolo.environments.Environment, RewardResult, RewardMetric, EnvironmentGeneration |
Define task prompt and reward behavior once for evals, GEPA, SFT, and GRPO. |
freesolo.evaluation.EvaluationClient |
Run custom-scorer evals or environment evals and upload results to Freesolo. |
freesolo.evaluation.run_local_evaluation |
Run custom scorers locally without uploading results. |
freesolo.evaluation.CustomScorer, BinaryResponse, NumericResponse |
Define local scorer logic for eval rows. |
freesolo.evaluation.HostedJudgeClient and hosted scorer classes |
Use hosted LLM-as-judge scorers with OpenRouter-compatible credentials. |
freesolo.gepa.GEPASetup, GEPAConfig, DefaultReflectionAgent, attach_gepa, optimize_gepa |
Optimize prompts through the GEPA adapter using the same environment and dataset abstractions. |
freesolo.training.SftConfig, GrpoConfig, TrainGrpoOptions, train_sft, train_grpo |
Start SFT or GRPO training from package APIs. |
freesolo.tracing.configure_tracer, get_tracer, force_flush, shutdown |
Export OpenTelemetry traces when observability is needed. |
freesolo.utils.oracle.generate_ground_truth_records |
Generate ground-truth JSONL records from source examples using a contract, environment, and oracle model. |
freesolo.utils.upload.upload_tinker_checkpoint_to_huggingface |
Upload a Tinker checkpoint to a private Hugging Face model repo. |
Package Docs
The generated-repo-facing package notes live next to the modules:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file freesolo-0.2.13.tar.gz.
File metadata
- Download URL: freesolo-0.2.13.tar.gz
- Upload date:
- Size: 289.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9fd81c67573a6170b09cd2640ee947f7bbd13212843f397d145a8da79da23b99
|
|
| MD5 |
3f8e48b8e592096ab3c51403c6811f30
|
|
| BLAKE2b-256 |
7b01f76ab82ef06af54281532302d7c33015bb60615401477d00bd0b87a1c627
|
File details
Details for the file freesolo-0.2.13-py3-none-any.whl.
File metadata
- Download URL: freesolo-0.2.13-py3-none-any.whl
- Upload date:
- Size: 80.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f0047b46af8bca994e5baa8a84f812aafab25eb1f4652e2530af46505cd746b5
|
|
| MD5 |
b992be7fbe3d7c0891761c2472e96172
|
|
| BLAKE2b-256 |
13decfef1b60f005caacb6f130d52a63e33ca0136133c705b893856b8044ea94
|