Skip to main content

Standalone agentic framework for local LLMs via Ollama — reliable tool calling, session persistence, and loop guards

Project description

local-agent-core

PyPI version Python 3.10+ License: MIT

A standalone Python framework that makes Qwen3-8B behave reliably as an agent when running via Ollama. It solves the three production problems you will hit within the first day of building a Qwen3 tool-calling loop:

Problem Root Cause Fix
Model gets stuck / loops forever No termination conditions guards.py — MaxTurns + Budget + Repetition guards
Tool call results not handled cleanly No orphan injection on crash tool_result.py — always injects a result, even on failure
No session persistence between queries History reconstructed per request session_store.pySession holds history in-memory

Architecture is borrowed from Claude Code's QueryEngine.ts, query.ts, Task.ts, and Tool.ts.


Install

pip install local-agent-core

Prerequisites: Ollama must be running locally with Qwen3:8b pulled.

ollama serve          # terminal tab 1
ollama pull qwen3:8b  # first time only

Quick Start

from local_agent_core import AgentSession, GuardTripped

agent = AgentSession(system_prompt="You are a helpful assistant.")

@agent.tool(
    name="calculator",
    description="Evaluate a math expression",
    parameters={
        "type": "object",
        "properties": {"expression": {"type": "string"}},
        "required": ["expression"]
    }
)
def calculator(expression: str) -> str:
    return str(eval(expression, {"__builtins__": {}}))

response = agent.chat("What is 42 * 7?")   # → "The result is 294."
response2 = agent.chat("Double that.")      # history preserved automatically

Flask Integration

from flask import Flask, request, jsonify
from local_agent_core import AgentSession, GuardTripped, DiskSessionStore

app = Flask(__name__)
store = DiskSessionStore("/var/data/sessions")

SYSTEM_PROMPT = "You are a helpful assistant."

@app.route("/chat", methods=["POST"])
def chat():
    body = request.json
    session_id = body.get("session_id")   # None = new session
    message    = body["message"]

    agent = AgentSession(
        system_prompt=SYSTEM_PROMPT,
        session_id=session_id,
        store=store,
        max_turns=15,
    )
    _register_tools(agent)   # attach your tools here

    try:
        response = agent.chat(message)
        return jsonify({
            "response": response,
            "session_id": agent.session_id,
            "guards": agent.guard_summary(),
        })
    except GuardTripped as g:
        return jsonify({
            "response": f"I had to stop: {g.reason}",
            "session_id": agent.session_id,
            "guard_tripped": g.guard_name,
        })

API Reference

AgentSession

The only class you need to import in application code.

AgentSession(
    system_prompt           = "",               # Injected as role=system at turn 0
    model                   = "qwen3:8b",
    base_url                = "http://localhost:11434/v1",
    max_turns               = 20,               # Hard loop ceiling
    max_tokens_budget       = 32_000,           # Cumulative token ceiling per session
    max_tokens_per_response = 4096,             # Per-response output limit
    temperature             = 0.0,              # 0.0 = deterministic tool calling
    session_id              = None,             # Provide to resume existing session
    store                   = None,             # InMemorySessionStore by default
)
Method Purpose
agent.chat(message) Send message, get response. Mutates session history.
@agent.tool(name, description, parameters) Register a tool via decorator
agent.register_tool(ToolDef) Register a pre-built ToolDef
agent.guard_summary() Returns dict with turns/tokens/tool_calls used
agent.history() Full raw message list
agent.session_id The session's UUID string
agent.reset_guards() Reset guard counters without clearing history

Loop Guards

Three independent loop-breakers. Any one raising GuardTripped terminates the loop.

LoopGuards(
    max_turns            = 20,      # Raise after this many model responses
    max_tokens           = 32_000,  # Raise when cumulative tokens exceed this
    repetition_threshold = 3,       # Raise when same tool+args called this many times
)

Tuning guide:

Task type Recommended max_turns
Simple Q&A 5
Single tool lookup 8
Multi-step analysis 15
Complex coding / research 30–50

Catching guard trips:

from local_agent_core import GuardTripped

try:
    response = agent.chat(user_input)
except GuardTripped as g:
    print(g.guard_name)   # "MaxTurns" | "Budget" | "Repetition"
    print(g.reason)       # human-readable explanation

Session Stores

from local_agent_core import InMemorySessionStore, DiskSessionStore

# In-memory (default) — fast, resets on process restart
store = InMemorySessionStore()

# Disk-based — survives restarts, good for single-server deployments
store = DiskSessionStore("/var/data/sessions")

For multi-worker production (gunicorn with >1 worker), implement RedisSessionStore with the same get(), create(), save() interface as DiskSessionStore.


Why temperature=0.0?

Deterministic output means the model makes the same tool call decision given the same history. With temperature > 0, a model might randomly decide not to use a tool, making the agent unreliable.

Why strip Qwen3 reasoning?

Qwen3:8b includes a reasoning field in every response (~150–300 tokens of chain-of-thought). If re-injected into history, a 10-turn conversation wastes 2,000+ tokens on reasoning the model never needs to see again. OllamaClient.extract_message() strips it before adding to session history.

Why not LangChain / LlamaIndex?

Both add abstraction layers that hide the finish_reason state machine and make it harder to inspect and fix broken message history. This framework exposes the raw OpenAI message format directly — you always know exactly what is being sent to the model.


Observed Qwen3:8b Behaviour

Metric Value
Tokens for simple Q&A ~150 total (15 prompt, 135 completion incl. reasoning)
Tokens for tool call (42*7) ~666 total across 2 turns
Tool call format Standard OpenAI — finish_reason: "tool_calls", arguments as JSON string
Reasoning field Always present, ~100–200 tokens, stripped by this library
Temperature=0 consistency Deterministic across repeated runs

Known Limitations

  • requires_confirmation flag in ToolDef is defined but not yet wired into the loop.
  • RedisSessionStore not yet implemented — needed for multi-worker Flask (gunicorn with >1 worker).
  • No async support — OllamaClient uses httpx.Client (sync). For FastAPI or async Flask, replace with httpx.AsyncClient and add async/await to loop.py.
  • BudgetGuard.max_tokens=32_000 is conservative — Qwen3:8b context window is 128k. Raise this for long research tasks.

Running the Tests

Tests run against a live Ollama instance (not mocked — that is intentional).

pip install local-agent-core
ollama serve && ollama pull qwen3:8b
python -m pytest tests/ -v

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

local_agent_core-0.1.0.tar.gz (20.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

local_agent_core-0.1.0-py3-none-any.whl (19.9 kB view details)

Uploaded Python 3

File details

Details for the file local_agent_core-0.1.0.tar.gz.

File metadata

  • Download URL: local_agent_core-0.1.0.tar.gz
  • Upload date:
  • Size: 20.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for local_agent_core-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4114db9de272d39f3eb73d20c7ac000a3fa413bef5e52c276d46217f732882ba
MD5 65e9ae8bb1bf8a395fa308dd1bb7c966
BLAKE2b-256 8d5295412e849ce8b733af92b88ae1faeb9c470e37000d007066e32a8c4518df

See more details on using hashes here.

Provenance

The following attestation bundles were made for local_agent_core-0.1.0.tar.gz:

Publisher: publish.yml on chibokocl/local-agent-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file local_agent_core-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for local_agent_core-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 87dc019f3e06fde4267b398feb1ea2bb049a348367bcffe71f217f34cedde3bb
MD5 707fd5f87466a1bbb8ce4bcd47e7b365
BLAKE2b-256 94cdbd495f6dafc3e81eebbaa960ad82435368cd738847d595c7f929d7b81a53

See more details on using hashes here.

Provenance

The following attestation bundles were made for local_agent_core-0.1.0-py3-none-any.whl:

Publisher: publish.yml on chibokocl/local-agent-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page