Skip to main content

Local-first AI agent framework. Built for models that aren't perfect.

Project description

FreeAgent SDK

PyPI version Python versions License: MIT Tests Docs Downloads

A clean local agent SDK for Ollama, vLLM, and OpenAI-compatible servers.

Streaming. Multi-turn out of the box. Markdown skills and memory. Built-in telemetry. Single dependency.

pip install freeagent-sdk

Links: Documentation · Tutorial · Changelog · Contributing · Examples · Evaluation data

Why FreeAgent

  • Local-first: works with Ollama and vLLM — your data never leaves your machine
  • Streaming everywhere: token-level streaming with semantic events
  • Multi-turn that just works: conversation state managed automatically with pluggable strategies
  • Markdown is first-class: skills and memory are human-readable .md files with frontmatter
  • Zero-config: auto-detects model size and tunes defaults — works on 2B and 70B alike
  • Inspectable: agent.trace() shows exactly what happened
  • Fast: actually 2% faster than raw Ollama API (HTTP connection reuse)
  • Honest: real benchmark data in this README, not marketing

Quick Start

CLI

# One-shot query with streaming
freeagent ask qwen3:8b "What's the capital of France?"

# Interactive chat
freeagent chat qwen3:8b

# List available models
freeagent models

Python

from freeagent import Agent

agent = Agent(model="qwen3:8b")
print(agent.run("What is Python?"))

Streaming

Real token-by-token streaming, even for tool-using agents:

from freeagent import Agent
from freeagent.events import TokenEvent, ToolCallEvent, ToolResultEvent

agent = Agent(model="qwen3:8b", tools=[weather])

for event in agent.run_stream("What's the weather in Tokyo?"):
    if isinstance(event, TokenEvent):
        print(event.text, end="", flush=True)
    elif isinstance(event, ToolCallEvent):
        print(f"\n[Calling {event.name}...]")
    elif isinstance(event, ToolResultEvent):
        print(f"[{event.name} -> {'ok' if event.success else 'fail'} ({event.duration_ms:.0f}ms)]")

Async version: async for event in agent.arun_stream("query"):

Event types: RunStartEvent, TokenEvent, ToolCallEvent, ToolResultEvent, ValidationErrorEvent, RetryEvent, IterationEvent, RunCompleteEvent.

Custom Tools

from freeagent import Agent, tool

@tool
def weather(city: str) -> dict:
    """Get current weather for a city."""
    return {"city": city, "temp": 72, "condition": "sunny"}

agent = Agent(model="qwen3:8b", tools=[weather])
print(agent.run("What's the weather in Portland?"))

Multi-Turn Conversations

agent = Agent(model="qwen3:8b", tools=[weather])
agent.run("What's the weather in Tokyo?")
agent.run("Convert that to Celsius")  # remembers Tokyo was 85°F

Strategies

from freeagent import Agent, SlidingWindow, TokenWindow

# Default: SlidingWindow(max_turns=20)
agent = Agent(model="qwen3:8b")

# Token-based budget (better for small context models)
agent = Agent(model="qwen3:4b", conversation=TokenWindow(max_tokens=3000))

# Stateless mode (each run independent)
agent = Agent(model="qwen3:8b", conversation=None)

Session Persistence

agent = Agent(model="qwen3:8b", session="my-chat")
agent.run("Hello!")
# Later, in a new process:
agent = Agent(model="qwen3:8b", session="my-chat")  # restores conversation

Inspecting Runs

Every run is fully traced. See exactly what happened:

agent.run("What's 347 * 29?")

# One-line summary
print(agent.last_run.summary())
# Run 1: qwen3:8b (native) 2300ms, 2 iters, 1 tools

# Full timeline
print(agent.trace())
# +     0ms  model_call_start     iter=0
# +   800ms  tool_call            calc(expression='347*29')
# +   802ms  tool_result          calc -> ok (2ms)
# +   803ms  model_call_start     iter=1

# Markdown report
print(agent.last_run.to_markdown())

Model-Aware Defaults

FreeAgent auto-detects model capabilities from Ollama and tunes itself:

# Auto-tuned: detects 2B model, strips skills and memory tool
agent = Agent(model="gemma4:e2b")

# Auto-tuned: detects 8B model, keeps full defaults
agent = Agent(model="qwen3:8b")

# Override auto-tuning
agent = Agent(model="gemma4:e2b", bundled_skills=True, memory_tool=True)

# Disable auto-tuning entirely
agent = Agent(model="qwen3:8b", auto_tune=False)

Access detected info: agent.model_info.parameter_count, agent.model_info.context_length, agent.model_info.capabilities.

Skills (Markdown Prompt Extensions)

---
name: nba-analyst
description: Basketball statistics expert
tools: [search, calculator]
---

You are an NBA analyst. Always cite your sources.
When comparing players, use per-game averages.
agent = Agent(model="qwen3:8b", tools=[search, calculator], skills=["./my-skills"])

Bundled skills load automatically. User skills extend them — duplicate names override.

Memory (Markdown-Backed)

Every agent has built-in memory stored as human-readable .md files:

.freeagent/memory/
├── MEMORY.md          # Index
├── user.md            # auto_load: true → in system prompt
├── facts.md           # Accumulated facts
└── 2026-04-05.md      # Daily log

The agent gets a memory tool with actions: read, write, append, search, list. Only the index and auto_load files go into the system prompt — everything else is on demand.

Multi-Provider Support

from freeagent import Agent, VLLMProvider, OpenAICompatProvider

# vLLM
provider = VLLMProvider(model="qwen3-8b")
agent = Agent(model="qwen3-8b", provider=provider, tools=[my_tool])

# Any OpenAI-compatible server
provider = OpenAICompatProvider(model="llama3.1:8b", base_url="http://localhost:1234")
agent = Agent(model="llama3.1:8b", provider=provider, tools=[my_tool])

Telemetry

Built-in, always on:

agent.run("What's the weather?")
print(agent.metrics)               # quick summary
print(agent.metrics.tool_stats())  # per-tool breakdown
agent.metrics.to_json("m.json")   # export

Optional OpenTelemetry: pip install freeagent-sdk[otel]

MCP Support

from freeagent.mcp import connect

async with connect("npx -y @modelcontextprotocol/server-filesystem /tmp") as tools:
    agent = Agent(model="qwen3:8b", tools=tools)
    result = await agent.arun("List files in /tmp")

Install with: pip install freeagent-sdk[mcp]

Real Performance

Tested against the raw Ollama API with the same eval suite (100+ cases, 4 models). Full data in evaluation/.

Multi-Turn Conversations (6 conversations, 15 turns)

Model Raw Ollama FreeAgent
qwen3:8b 93% 87%
qwen3:4b 93% 87%
llama3.1:8b 87% 80%
gemma4:e2b (2B) N/A 80%

Tool Calling Accuracy (8 cases)

Model Raw Ollama FreeAgent
qwen3:8b 75% 75%
qwen3:4b 100% 88%
llama3.1:8b 62% 75% (+13%)

Streaming Latency (median of 3 runs)

Model Chat TTFT Chat Total Tool TTFT Tool Total
qwen3:8b 12.8s 13.9s 5.2s 10.0s
qwen3:4b 14.7s 14.5s 28.2s 31.6s
llama3.1:8b 1.5s 1.4s 1.8s 2.1s
gemma4:e2b 4.7s 5.1s 8.2s 12.1s

TTFT ≈ total for chat (generation is fast once started). Tool TTFT includes tool execution round-trip.

Auto-Tune (v0.3.1)

Model auto_tune=True All On Manual Strip Delta vs All On
qwen3:8b 91% 91% +0%
qwen3:4b 91% 91% +0%
llama3.1:8b 100% 100% +0%
gemma4:e2b 91% 55% 73% +36%

Auto-tune detects gemma4:e2b as a small model and strips bundled skills + memory tool. This improves accuracy from 55% → 91%.

Honest Caveats

  • Guardrails rarely fire: 0/40 real rescues in adversarial testing. Modern models handle fuzzy names and type coercion natively.
  • Multi-turn gap to raw Ollama is noise: 87% vs 93% — re-running failures produces passes. Non-deterministic.
  • Skills help qwen3:4b but hurt gemma4:e2b — fixed by auto-tune, which strips them for small models.
  • Streaming TTFT ≈ total time on small models: generation is fast, model thinking dominates latency.

Full analysis: evaluation/THESIS_ANALYSIS.md

Tested Models

Model Size Mode Reliability
Qwen3 8B 8.2B Native Very Good
Qwen3 4B 4.0B Native Good (best with skills)
Llama 3.1 8B 8.0B Native Good
Gemma4 E2B 5.1B Native Good (auto-tuned)

Requirements

  • Python 3.10+
  • Ollama running locally (ollama serve)
  • A model pulled (ollama pull qwen3:8b)

Documentation

  • Tutorial — 5-minute walkthrough from install to working agent
  • Website — landing page and feature overview
  • Examples — runnable scripts covering tools, memory, hooks, MCP
  • Evaluation data — benchmark results and thesis analysis
  • Changelog — release history
  • Contributing — how to run tests, add skills, submit PRs

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

freeagent_sdk-0.3.2.tar.gz (74.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

freeagent_sdk-0.3.2-py3-none-any.whl (60.7 kB view details)

Uploaded Python 3

File details

Details for the file freeagent_sdk-0.3.2.tar.gz.

File metadata

  • Download URL: freeagent_sdk-0.3.2.tar.gz
  • Upload date:
  • Size: 74.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for freeagent_sdk-0.3.2.tar.gz
Algorithm Hash digest
SHA256 9df82cfca62b4f4b9f8a56a8a40dd8dce60099d0a229f94f1dca0653c3a21827
MD5 38264379b04d81489dd5a601f969bd98
BLAKE2b-256 69630cbef702013898a6c126069abe151f77184f7072110a5f143cb3a7b0a42b

See more details on using hashes here.

Provenance

The following attestation bundles were made for freeagent_sdk-0.3.2.tar.gz:

Publisher: publish.yml on labeveryday/freeagent-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file freeagent_sdk-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: freeagent_sdk-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 60.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for freeagent_sdk-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 afa044d42596189f4e4b8cee0703250bb8a9b8e9315db6df2088a4aac4e39ed8
MD5 5d51a42a43fc5edf02adbffd05c04562
BLAKE2b-256 4a592885f49f8613a3848f756205eabd5b775e6a7aa71f2d05f73f314916de99

See more details on using hashes here.

Provenance

The following attestation bundles were made for freeagent_sdk-0.3.2-py3-none-any.whl:

Publisher: publish.yml on labeveryday/freeagent-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page