Tracing and evaluation SDK for LLM applications.

Project description

freesolo

freesolo is a Python tracing and evaluation package for LLM apps.

It is built for the lowest-friction integration possible:

Install the package
Set FREESOLO_API_KEY
Wrap your OpenAI, Anthropic, Gemini, or OpenAI-compatible client
Run traces and evaluations from the same SDK

Current provider support

freesolo currently supports automatic client instrumentation for:

OpenAI
Anthropic
Gemini
OpenAI-compatible clients via wrap(...) / wrap_provider(...)

Install

Install the package plus the provider SDK you use:

pip install freesolo openai

pip install freesolo anthropic

pip install freesolo google-genai

Environment

FREESOLO_API_KEY
FREESOLO_BASE_URL (optional, defaults to https://api.freesolo.co)

export FREESOLO_API_KEY=fslo_...

Quickstart

from openai import OpenAI
from freesolo import wrap

client = wrap(OpenAI())

result = client.responses.create(
    model="gpt-4.1-mini",
    instructions="Reply in plain text.",
    input=[
        {
            "role": "user",
            "content": [{"type": "input_text", "text": "How do I reset my password?"}],
        }
    ],
)

print(result.output_text or "")

OpenRouter Quickstart

from openai import OpenAI
from freesolo import wrap

client = wrap(
    OpenAI(
        base_url="https://openrouter.ai/api/v1",
        api_key="YOUR_OPENROUTER_API_KEY",
    )
)

response = client.chat.completions.create(
    model="openai/gpt-4.1-mini",
    messages=[
        {"role": "system", "content": "Reply in plain text."},
        {"role": "user", "content": "Write a one-sentence launch blurb."},
    ],
    max_tokens=120,
)

print(response.choices[0].message.content or "")

Gemini Quickstart

from google import genai
from freesolo import instrument_gemini

client = instrument_gemini(genai.Client())

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Write a one-sentence release note for traced Gemini support.",
)

print(response.text or "")

Group Multiple Model Calls

For agentic or long-horizon tasks, strongly prefer wrapping the whole task in start_trace(...) so all of the model calls land in one trace.

For a single one-off OpenAI, Anthropic, or Gemini request, you can skip it.

from anthropic import Anthropic
from freesolo import instrument_anthropic, start_trace

client = instrument_anthropic(Anthropic())

with start_trace("support-agent-run"):
    first = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=64,
        messages=[{"role": "user", "content": "Say hello"}],
    )
    second = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=64,
        messages=[{"role": "user", "content": "Say goodbye"}],
    )

What Gets Stored

Trace title if you explicitly pass it to start_trace("...")
Trace metadata if you explicitly pass it to start_trace(..., metadata=...)
Input payloads with system_prompt, user_prompt, and images
Output payloads as plain text
Token usage when available
Image inputs with inline previews for the trace UI

Notes

You do not need @trace() for ordinary LLM tracing.
A single instrumented OpenAI, Anthropic, or Gemini request creates a trace automatically.
For OpenAI-compatible providers like OpenRouter, prefer wrap(...) instead of provider-specific helpers.
For agentic or long-horizon workflows, strongly recommend start_trace("descriptive-title") so planning, retries, and follow-up calls stay grouped.
Delivery is best-effort by default. Trace ingestion failures do not break your app.

Evaluations

freesolo also includes a small evaluation SDK for CI jobs, GitHub bots, and eval scripts. All evaluation runs require FREESOLO_API_KEY or an explicit api_key.

Evaluation data is a list of plain dictionaries. There is no separate Example class to construct.

Define scorers by subclassing CustomScorer and returning BinaryResponse or NumericResponse. Scorers run in your process, and Freesolo uploads the final results with your API key. Pass scorer objects, not strings.

from typing import Any

from freesolo import Freesolo
from freesolo.evaluation import BinaryResponse, CustomScorer


class ExactMatch(CustomScorer[BinaryResponse]):
    async def score(self, row: dict[str, Any]) -> BinaryResponse:
        actual = str(row.get("actual_output", "")).strip()
        expected = str(row.get("expected_output", "")).strip()
        return BinaryResponse(
            value=actual == expected and bool(actual),
            reason="actual_output matched expected_output",
        )


client = Freesolo()

results = client.evals.run(
    name="support-agent-correctness",
    data=[
        {
            "input": "What is the capital of France?",
            "actual_output": "Paris",
            "expected_output": "Paris",
        }
    ],
    scorers=[ExactMatch()],
)

print(results[0].success)

Custom scorer:

from typing import Any

from freesolo import Freesolo
from freesolo.evaluation import BinaryResponse, CustomScorer


class NoEmptyAnswer(CustomScorer[BinaryResponse]):
    async def score(self, row: dict[str, Any]) -> BinaryResponse:
        ok = bool(str(row.get("actual_output", "")).strip())
        return BinaryResponse(value=ok, reason="actual_output is non-empty")


results = Freesolo().evals.run(
    name="support-agent-non-empty",
    data=[{"actual_output": "hello"}],
    scorers=[NoEmptyAnswer()],
)

LLM-as-judge is also a custom scorer. The scorer can call your judge model and return a NumericResponse; Freesolo stores the eval run and score output with your FREESOLO_API_KEY. This example uses OPENAI_API_KEY for the judge model call and FREESOLO_API_KEY for eval upload.

import json
from typing import Any

from openai import OpenAI

from freesolo import Freesolo, instrument_openai
from freesolo.evaluation import CustomScorer, NumericResponse


class CorrectnessJudge(CustomScorer[NumericResponse]):
    name = "correctness_llm_judge"
    threshold = 0.8

    def __init__(self, client: OpenAI) -> None:
        self.client = client

    async def score(self, row: dict[str, Any]) -> NumericResponse:
        response = self.client.responses.create(
            model="gpt-4.1-mini",
            instructions=(
                "Grade correctness from 0.0 to 1.0. "
                "Return JSON only: {\"score\": 0.0, \"reason\": \"...\"}"
            ),
            input=[
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "input_text",
                            "text": json.dumps(
                                {
                                    "input": row.get("input", ""),
                                    "actual_output": row.get("actual_output", ""),
                                    "expected_output": row.get("expected_output", ""),
                                }
                            ),
                        }
                    ],
                }
            ],
        )

        parsed = json.loads(response.output_text or "{}")
        return NumericResponse(
            value=float(parsed["score"]),
            reason=str(parsed.get("reason", "")),
        )


judge_client = instrument_openai(OpenAI())

results = Freesolo().evals.run(
    name="support-agent-correctness",
    data=[
        {
            "input": "What is the capital of France?",
            "actual_output": "Paris is the capital of France.",
            "expected_output": "Paris",
        }
    ],
    scorers=[CorrectnessJudge(judge_client)],
)

Hosted scorers are also available out of the box and use OpenRouter by default:

ReferenceCorrectnessScorer
RubricScorer
GroundednessScorer
InstructionFollowingScorer
PairwisePreferenceScorer

from freesolo.evaluation import HostedJudgeClient, ReferenceCorrectnessScorer

judge = HostedJudgeClient(
    api_key="YOUR_OPENROUTER_API_KEY",
    model="openai/gpt-oss-120b",
)

scorer = ReferenceCorrectnessScorer(client=judge)

Tracing is available from the same root client:

from freesolo import Freesolo

client = Freesolo()

with client.traces.start("support-agent-run"):
    ...

You can also import namespaced tracing helpers directly:

from freesolo.tracing import start_trace, wrap

Project details

Release history Release notifications | RSS feed

0.2.22

May 22, 2026

0.2.21

May 22, 2026

0.2.20

May 22, 2026

0.2.19

May 22, 2026

0.2.18

May 21, 2026

0.2.17

May 19, 2026

0.2.16

May 19, 2026

0.2.15

May 18, 2026

0.2.14

May 18, 2026

0.2.13

May 16, 2026

0.2.12

May 15, 2026

0.2.11

May 15, 2026

0.2.10

May 15, 2026

0.2.9

May 15, 2026

0.2.8

May 15, 2026

0.2.7

May 13, 2026

0.2.6

May 13, 2026

0.2.5

May 13, 2026

0.2.4

May 13, 2026

0.2.3

May 12, 2026

This version

0.2.2

May 11, 2026

0.1.6

May 3, 2026

0.1.5

May 3, 2026

0.1.4

Apr 29, 2026

0.1.3

Apr 24, 2026

0.1.2

Apr 24, 2026

0.1.1

Apr 24, 2026

0.1.0

Apr 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

freesolo-0.2.2.tar.gz (81.4 kB view details)

Uploaded May 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

freesolo-0.2.2-py3-none-any.whl (28.9 kB view details)

Uploaded May 11, 2026 Python 3

File details

Details for the file freesolo-0.2.2.tar.gz.

File metadata

Download URL: freesolo-0.2.2.tar.gz
Upload date: May 11, 2026
Size: 81.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for freesolo-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`d60dae7875952f4ce042ff67648b77c3bd481fda88fff6d1b5ede3e03ad63215`
MD5	`7628376499d4103555bd3aeade95b240`
BLAKE2b-256	`503b2fb0382ef5eb14f7e9096dd3f7cea4081099b03508c7180afe4cee02f0ea`

See more details on using hashes here.

File details

Details for the file freesolo-0.2.2-py3-none-any.whl.

File metadata

Download URL: freesolo-0.2.2-py3-none-any.whl
Upload date: May 11, 2026
Size: 28.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for freesolo-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`747bc53a7df907a277d13e3789d32c824bc48dfdf52a396d588e0c35fb8aa87c`
MD5	`bf9fe1ae9c5903a2f3dac21d775d3dcf`
BLAKE2b-256	`ff18033385bacdf2fdd5583db07ed6b2f551aaae0a1c6777c3aca8c15b25e128`

See more details on using hashes here.

freesolo 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

freesolo

Current provider support

Install

Environment

Quickstart

OpenRouter Quickstart

Gemini Quickstart

Group Multiple Model Calls

What Gets Stored

Notes

Evaluations

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes