Skip to main content

Tracing and evaluation SDK for LLM applications.

Project description

freesolo

freesolo is a Python tracing and evaluation package for LLM apps.

For the Node/npm package, see npm/.

It is built for the lowest-friction integration possible:

  1. Install the package
  2. Set FREESOLO_API_KEY
  3. Wrap your OpenAI, Anthropic, Gemini, or OpenAI-compatible client
  4. Run traces and evaluations from the same SDK

Current provider support

freesolo currently supports automatic client instrumentation for:

  • OpenAI
  • Anthropic
  • Gemini
  • OpenAI-compatible clients via wrap(...) / wrap_provider(...)

Install

Install the package plus the provider SDK you use:

pip install freesolo openai

or

pip install freesolo anthropic

or

pip install freesolo google-genai

Environment

  • FREESOLO_API_KEY

Quickstart

from openai import OpenAI
from freesolo import wrap

client = wrap(OpenAI())

result = client.responses.create(
    model="gpt-4.1-mini",
    instructions="Reply in plain text.",
    input=[
        {
            "role": "user",
            "content": [{"type": "input_text", "text": "How do I reset my password?"}],
        }
    ],
)

print(result.output_text or "")

OpenRouter Quickstart

from openai import OpenAI
from freesolo import wrap

client = wrap(
    OpenAI(
        base_url="https://openrouter.ai/api/v1",
        api_key="YOUR_OPENROUTER_API_KEY",
    )
)

response = client.chat.completions.create(
    model="openai/gpt-4.1-mini",
    messages=[
        {"role": "system", "content": "Reply in plain text."},
        {"role": "user", "content": "Write a one-sentence launch blurb."},
    ],
    max_tokens=120,
)

print(response.choices[0].message.content or "")

Gemini Quickstart

from google import genai
from freesolo import instrument_gemini

client = instrument_gemini(genai.Client())

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Write a one-sentence release note for traced Gemini support.",
)

print(response.text or "")

Group Multiple Model Calls

For agentic or long-horizon tasks, strongly prefer wrapping the whole task in start_trace(...) so all of the model calls land in one trace.

For a single one-off OpenAI, Anthropic, or Gemini request, you can skip it.

from anthropic import Anthropic
from freesolo import instrument_anthropic, start_trace

client = instrument_anthropic(Anthropic())

with start_trace("support-agent-run"):
    first = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=64,
        messages=[{"role": "user", "content": "Say hello"}],
    )
    second = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=64,
        messages=[{"role": "user", "content": "Say goodbye"}],
    )

What Gets Stored

  • Trace title if you explicitly pass it to start_trace("...")
  • Trace metadata if you explicitly pass it to start_trace(..., metadata=...)
  • Input payloads with system_prompt, user_prompt, and images
  • Output payloads as plain text
  • Token usage when available
  • Image inputs with inline previews for the trace UI

Notes

  • You do not need @trace() for ordinary LLM tracing.
  • A single instrumented OpenAI, Anthropic, or Gemini request creates a trace automatically.
  • For OpenAI-compatible providers like OpenRouter, prefer wrap(...) instead of provider-specific helpers.
  • For agentic or long-horizon workflows, strongly recommend start_trace("descriptive-title") so planning, retries, and follow-up calls stay grouped.
  • Delivery is best-effort by default. Trace ingestion failures do not break your app.

Evaluations

freesolo also includes a small evaluation SDK for CI jobs, GitHub bots, and eval scripts. All evaluation runs require FREESOLO_API_KEY or an explicit api_key.

Evaluation data is a list of plain dictionaries. There is no separate Example class to construct.

There are two evaluation paths:

  • Custom scorers: subclass CustomScorer and return BinaryResponse or NumericResponse. These run in your process and persist results to Freesolo with your API key.
  • Hosted scorers: pass scorer names such as exact_match, contains, json_valid, correctness, answer_relevancy, or faithfulness. These go to the Freesolo API with FREESOLO_API_KEY. LLM-as-judge scorers always use the Freesolo server's configured model/provider key. Users do not provide an OpenRouter key to the SDK.
from freesolo import Freesolo

client = Freesolo(api_key="fslo_...", project_name="support-agent")

results = client.evals.run(
    [
        {
            "input": "What is the capital of France?",
            "actual_output": "Paris",
            "expected_output": "Paris",
        }
    ],
    scorers=["exact_match", "correctness"],
    eval_run_name="pr-123",
)

print(results[0].success)

Custom scorer:

from typing import Any

from freesolo import Freesolo
from freesolo.evaluation import BinaryResponse, CustomScorer


class NoEmptyAnswer(CustomScorer[BinaryResponse]):
    async def score(self, row: dict[str, Any]) -> BinaryResponse:
        ok = bool(str(row.get("actual_output", "")).strip())
        return BinaryResponse(value=ok, reason="actual_output is non-empty")


results = Freesolo(api_key="fslo_...", project_name="support-agent").evals.run(
    [{"actual_output": "hello"}],
    scorers=[NoEmptyAnswer()],
    eval_run_name="custom-smoke",
)

For CI, set assert_test=True to raise an AssertionError when any row fails:

from freesolo import Freesolo

Freesolo(project_name="support-agent").evals.run(
    [{"actual_output": "Paris", "expected_output": "Paris"}],
    scorers=["exact_match"],
    eval_run_name="ci-smoke",
    assert_test=True,
)

Runnable evaluation examples live in:

python -m freesolo.examples.evaluation.hosted
python -m freesolo.examples.evaluation.custom

Tracing is available from the same root client:

from freesolo import Freesolo

client = Freesolo(api_key="fslo_...")

with client.traces.start("support-agent-run"):
    ...

You can also import namespaced tracing helpers directly:

from freesolo.tracing import start_trace, wrap

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

freesolo-0.1.5.tar.gz (115.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

freesolo-0.1.5-py3-none-any.whl (35.2 kB view details)

Uploaded Python 3

File details

Details for the file freesolo-0.1.5.tar.gz.

File metadata

  • Download URL: freesolo-0.1.5.tar.gz
  • Upload date:
  • Size: 115.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for freesolo-0.1.5.tar.gz
Algorithm Hash digest
SHA256 4e33748f12ebf5188dd6807ec68b18f0d70a69b8d7436e832b9846d45e79b423
MD5 a2af1f2ca4acebb6177dd62b90d955c6
BLAKE2b-256 22791a1232120c5e35e2694250fdc77afc583e8f3e4bcd33faae3e474b60c5dc

See more details on using hashes here.

File details

Details for the file freesolo-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: freesolo-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 35.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for freesolo-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 01ec68d21d2500cd5659cb5c67237f9bca1ce94b8e599e1100ebbc81502c4bc9
MD5 e7a333ae32800d951a9adbd0e84b7535
BLAKE2b-256 80bbaea0fa08746758fd65b7861f1bdd2180eca8c9281aa05516d1c57a7fa648

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page