Skip to main content

Local-first AI agent framework. Built for models that aren't perfect.

Project description

FreeAgent SDK

PyPI version Python versions License: MIT Tests Docs Downloads

A clean local agent SDK for Ollama, vLLM, and OpenAI-compatible servers.

Streaming. Multi-turn out of the box. Markdown skills and memory. Built-in telemetry. Single dependency.

pip install freeagent-sdk

Links: Documentation · Tutorial · Changelog · Contributing · Examples · Evaluation data

Why FreeAgent

  • Local-first: works with Ollama and vLLM — your data never leaves your machine
  • Streaming everywhere: token-level streaming with semantic events
  • Multi-turn that just works: conversation state managed automatically with pluggable strategies
  • Markdown is first-class: skills and memory are human-readable .md files with frontmatter
  • Zero-config: auto-detects model size and tunes defaults — works on 2B and 70B alike
  • Inspectable: agent.trace() shows exactly what happened
  • Fast: actually 2% faster than raw Ollama API (HTTP connection reuse)
  • Honest: real benchmark data in this README, not marketing

Quick Start

CLI

# One-shot query with streaming
freeagent ask qwen3:8b "What's the capital of France?"

# Interactive chat
freeagent chat qwen3:8b

# List available models
freeagent models

Python

from freeagent import Agent

agent = Agent(model="qwen3:8b")
print(agent.run("What is Python?"))

Streaming

Real token-by-token streaming, even for tool-using agents:

from freeagent import Agent
from freeagent.events import TokenEvent, ToolCallEvent, ToolResultEvent

agent = Agent(model="qwen3:8b", tools=[weather])

for event in agent.run_stream("What's the weather in Tokyo?"):
    if isinstance(event, TokenEvent):
        print(event.text, end="", flush=True)
    elif isinstance(event, ToolCallEvent):
        print(f"\n[Calling {event.name}...]")
    elif isinstance(event, ToolResultEvent):
        print(f"[{event.name} -> {'ok' if event.success else 'fail'} ({event.duration_ms:.0f}ms)]")

Async version: async for event in agent.arun_stream("query"):

Event types: RunStartEvent, TokenEvent, ToolCallEvent, ToolResultEvent, ValidationErrorEvent, RetryEvent, IterationEvent, RunCompleteEvent.

Custom Tools

from freeagent import Agent, tool

@tool
def weather(city: str) -> dict:
    """Get current weather for a city."""
    return {"city": city, "temp": 72, "condition": "sunny"}

agent = Agent(model="qwen3:8b", tools=[weather])
print(agent.run("What's the weather in Portland?"))

Multi-Turn Conversations

agent = Agent(model="qwen3:8b", tools=[weather])
agent.run("What's the weather in Tokyo?")
agent.run("Convert that to Celsius")  # remembers Tokyo was 85°F

Strategies

from freeagent import Agent, SlidingWindow, TokenWindow

# Default: SlidingWindow(max_turns=20)
agent = Agent(model="qwen3:8b")

# Token-based budget (better for small context models)
agent = Agent(model="qwen3:4b", conversation=TokenWindow(max_tokens=3000))

# Stateless mode (each run independent)
agent = Agent(model="qwen3:8b", conversation=None)

Session Persistence

agent = Agent(model="qwen3:8b", session="my-chat")
agent.run("Hello!")
# Later, in a new process:
agent = Agent(model="qwen3:8b", session="my-chat")  # restores conversation

Inspecting Runs

Every run is fully traced. See exactly what happened:

agent.run("What's 347 * 29?")

# One-line summary
print(agent.last_run.summary())
# Run 1: qwen3:8b (native) 2300ms, 2 iters, 1 tools

# Full timeline
print(agent.trace())
# +     0ms  model_call_start     iter=0
# +   800ms  tool_call            calc(expression='347*29')
# +   802ms  tool_result          calc -> ok (2ms)
# +   803ms  model_call_start     iter=1

# Markdown report
print(agent.last_run.to_markdown())

Model-Aware Defaults

FreeAgent auto-detects model capabilities from Ollama and tunes itself:

# Auto-tuned: detects 2B model, strips skills and memory tool
agent = Agent(model="gemma4:e2b")

# Auto-tuned: detects 8B model, keeps full defaults
agent = Agent(model="qwen3:8b")

# Override auto-tuning
agent = Agent(model="gemma4:e2b", bundled_skills=True, memory_tool=True)

# Disable auto-tuning entirely
agent = Agent(model="qwen3:8b", auto_tune=False)

Access detected info: agent.model_info.parameter_count, agent.model_info.context_length, agent.model_info.capabilities.

Skills (Markdown Prompt Extensions)

---
name: nba-analyst
description: Basketball statistics expert
tools: [search, calculator]
---

You are an NBA analyst. Always cite your sources.
When comparing players, use per-game averages.
agent = Agent(model="qwen3:8b", tools=[search, calculator], skills=["./my-skills"])

Bundled skills load automatically. User skills extend them — duplicate names override.

Memory (Markdown-Backed)

Every agent has built-in memory stored as human-readable .md files:

.freeagent/memory/
├── MEMORY.md          # Index
├── user.md            # auto_load: true → in system prompt
├── facts.md           # Accumulated facts
└── 2026-04-05.md      # Daily log

The agent gets a memory tool with actions: read, write, append, search, list. Only the index and auto_load files go into the system prompt — everything else is on demand.

Multi-Provider Support

from freeagent import Agent, VLLMProvider, OpenAICompatProvider

# vLLM
provider = VLLMProvider(model="qwen3-8b")
agent = Agent(model="qwen3-8b", provider=provider, tools=[my_tool])

# Any OpenAI-compatible server
provider = OpenAICompatProvider(model="llama3.1:8b", base_url="http://localhost:1234")
agent = Agent(model="llama3.1:8b", provider=provider, tools=[my_tool])

Telemetry

Built-in, always on:

agent.run("What's the weather?")
print(agent.metrics)               # quick summary
print(agent.metrics.tool_stats())  # per-tool breakdown
agent.metrics.to_json("m.json")   # export

Optional OpenTelemetry: pip install freeagent-sdk[otel]

MCP Support

from freeagent.mcp import connect

async with connect("npx -y @modelcontextprotocol/server-filesystem /tmp") as tools:
    agent = Agent(model="qwen3:8b", tools=tools)
    result = await agent.arun("List files in /tmp")

Install with: pip install freeagent-sdk[mcp]

Real Performance

Tested against the raw Ollama API with the same eval suite (100+ cases, 4 models). Full data in evaluation/.

Multi-Turn Conversations (6 conversations, 15 turns)

Model Raw Ollama FreeAgent
qwen3:8b 93% 87%
qwen3:4b 93% 87%
llama3.1:8b 87% 80%
gemma4:e2b (2B) N/A 80%

Tool Calling Accuracy (8 cases)

Model Raw Ollama FreeAgent
qwen3:8b 75% 75%
qwen3:4b 100% 88%
llama3.1:8b 62% 75% (+13%)

Streaming Latency (median of 3 runs)

Model Chat TTFT Chat Total Tool TTFT Tool Total
qwen3:8b 12.8s 13.9s 5.2s 10.0s
qwen3:4b 14.7s 14.5s 28.2s 31.6s
llama3.1:8b 1.5s 1.4s 1.8s 2.1s
gemma4:e2b 4.7s 5.1s 8.2s 12.1s

TTFT ≈ total for chat (generation is fast once started). Tool TTFT includes tool execution round-trip.

Auto-Tune (v0.3.1)

Model auto_tune=True All On Manual Strip Delta vs All On
qwen3:8b 91% 91% +0%
qwen3:4b 91% 91% +0%
llama3.1:8b 100% 100% +0%
gemma4:e2b 91% 55% 73% +36%

Auto-tune detects gemma4:e2b as a small model and strips bundled skills + memory tool. This improves accuracy from 55% → 91%.

Honest Caveats

  • Guardrails rarely fire: 0/40 real rescues in adversarial testing. Modern models handle fuzzy names and type coercion natively.
  • Multi-turn gap to raw Ollama is noise: 87% vs 93% — re-running failures produces passes. Non-deterministic.
  • Skills help qwen3:4b but hurt gemma4:e2b — fixed by auto-tune, which strips them for small models.
  • Streaming TTFT ≈ total time on small models: generation is fast, model thinking dominates latency.

Full analysis: evaluation/THESIS_ANALYSIS.md

Tested Models

Model Size Mode Reliability
Qwen3 8B 8.2B Native Very Good
Qwen3 4B 4.0B Native Good (best with skills)
Llama 3.1 8B 8.0B Native Good
Gemma4 E2B 5.1B Native Good (auto-tuned)

Requirements

  • Python 3.11+
  • Ollama running locally (ollama serve)
  • A model pulled (ollama pull qwen3:8b)

Documentation

  • Tutorial — 5-minute walkthrough from install to working agent
  • Website — landing page and feature overview
  • Examples — runnable scripts covering tools, memory, hooks, MCP
  • Evaluation data — benchmark results and thesis analysis
  • Changelog — release history
  • Contributing — how to run tests, add skills, submit PRs

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

freeagent_sdk-0.3.3.tar.gz (74.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

freeagent_sdk-0.3.3-py3-none-any.whl (60.7 kB view details)

Uploaded Python 3

File details

Details for the file freeagent_sdk-0.3.3.tar.gz.

File metadata

  • Download URL: freeagent_sdk-0.3.3.tar.gz
  • Upload date:
  • Size: 74.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for freeagent_sdk-0.3.3.tar.gz
Algorithm Hash digest
SHA256 dccb78805c0c493c270c15e1e15aa0492b61638f8e42a42a23c8cd830fbf55ae
MD5 5cb1e1b503120a5c1fdd577e39294398
BLAKE2b-256 45992c0fc6e93c8b89b7bc904a73fa340a20f7cd5f828fe5e3681ea51b90c72e

See more details on using hashes here.

Provenance

The following attestation bundles were made for freeagent_sdk-0.3.3.tar.gz:

Publisher: publish.yml on labeveryday/freeagent-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file freeagent_sdk-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: freeagent_sdk-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 60.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for freeagent_sdk-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0e58fe56ccedc37744da90953be498780339adf85606f31db229fcc7691522e3
MD5 eac44acd3f2582407e7e35c751148ab9
BLAKE2b-256 38a1d190152bc34f68f7bf0c343fee0ee1515a2d06f3b09d999445a0761542cf

See more details on using hashes here.

Provenance

The following attestation bundles were made for freeagent_sdk-0.3.3-py3-none-any.whl:

Publisher: publish.yml on labeveryday/freeagent-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page