Standalone agentic framework for local LLMs via Ollama — reliable tool calling, session persistence, and loop guards
Project description
local-agent-core
A standalone Python framework that makes Qwen3-8B behave reliably as an agent when running via Ollama. It solves the three production problems you will hit within the first day of building a Qwen3 tool-calling loop:
| Problem | Root Cause | Fix |
|---|---|---|
| Model gets stuck / loops forever | No termination conditions | guards.py — MaxTurns + Budget + Repetition guards |
| Tool call results not handled cleanly | No orphan injection on crash | tool_result.py — always injects a result, even on failure |
| No session persistence between queries | History reconstructed per request | session_store.py — Session holds history in-memory |
Architecture is borrowed from Claude Code's QueryEngine.ts, query.ts, Task.ts, and Tool.ts.
Install
pip install local-agent-core
Prerequisites: Ollama must be running locally with Qwen3:8b pulled.
ollama serve # terminal tab 1
ollama pull qwen3:8b # first time only
Quick Start
from local_agent_core import AgentSession, GuardTripped
agent = AgentSession(system_prompt="You are a helpful assistant.")
@agent.tool(
name="calculator",
description="Evaluate a math expression",
parameters={
"type": "object",
"properties": {"expression": {"type": "string"}},
"required": ["expression"]
}
)
def calculator(expression: str) -> str:
return str(eval(expression, {"__builtins__": {}}))
response = agent.chat("What is 42 * 7?") # → "The result is 294."
response2 = agent.chat("Double that.") # history preserved automatically
Flask Integration
from flask import Flask, request, jsonify
from local_agent_core import AgentSession, GuardTripped, DiskSessionStore
app = Flask(__name__)
store = DiskSessionStore("/var/data/sessions")
SYSTEM_PROMPT = "You are a helpful assistant."
@app.route("/chat", methods=["POST"])
def chat():
body = request.json
session_id = body.get("session_id") # None = new session
message = body["message"]
agent = AgentSession(
system_prompt=SYSTEM_PROMPT,
session_id=session_id,
store=store,
max_turns=15,
)
_register_tools(agent) # attach your tools here
try:
response = agent.chat(message)
return jsonify({
"response": response,
"session_id": agent.session_id,
"guards": agent.guard_summary(),
})
except GuardTripped as g:
return jsonify({
"response": f"I had to stop: {g.reason}",
"session_id": agent.session_id,
"guard_tripped": g.guard_name,
})
API Reference
AgentSession
The only class you need to import in application code.
AgentSession(
system_prompt = "", # Injected as role=system at turn 0
model = "qwen3:8b",
base_url = "http://localhost:11434/v1",
max_turns = 20, # Hard loop ceiling
max_tokens_budget = 32_000, # Cumulative token ceiling per session
max_tokens_per_response = 4096, # Per-response output limit
temperature = 0.0, # 0.0 = deterministic tool calling
session_id = None, # Provide to resume existing session
store = None, # InMemorySessionStore by default
)
| Method | Purpose |
|---|---|
agent.chat(message) |
Send message, get response. Mutates session history. |
@agent.tool(name, description, parameters) |
Register a tool via decorator |
agent.register_tool(ToolDef) |
Register a pre-built ToolDef |
agent.guard_summary() |
Returns dict with turns/tokens/tool_calls used |
agent.history() |
Full raw message list |
agent.session_id |
The session's UUID string |
agent.reset_guards() |
Reset guard counters without clearing history |
Loop Guards
Three independent loop-breakers. Any one raising GuardTripped terminates the loop.
LoopGuards(
max_turns = 20, # Raise after this many model responses
max_tokens = 32_000, # Raise when cumulative tokens exceed this
repetition_threshold = 3, # Raise when same tool+args called this many times
)
Tuning guide:
| Task type | Recommended max_turns |
|---|---|
| Simple Q&A | 5 |
| Single tool lookup | 8 |
| Multi-step analysis | 15 |
| Complex coding / research | 30–50 |
Catching guard trips:
from local_agent_core import GuardTripped
try:
response = agent.chat(user_input)
except GuardTripped as g:
print(g.guard_name) # "MaxTurns" | "Budget" | "Repetition"
print(g.reason) # human-readable explanation
Session Stores
from local_agent_core import InMemorySessionStore, DiskSessionStore
# In-memory (default) — fast, resets on process restart
store = InMemorySessionStore()
# Disk-based — survives restarts, good for single-server deployments
store = DiskSessionStore("/var/data/sessions")
For multi-worker production (gunicorn with >1 worker), implement RedisSessionStore with the same get(), create(), save() interface as DiskSessionStore.
Why temperature=0.0?
Deterministic output means the model makes the same tool call decision given the same history. With temperature > 0, a model might randomly decide not to use a tool, making the agent unreliable.
Why strip Qwen3 reasoning?
Qwen3:8b includes a reasoning field in every response (~150–300 tokens of chain-of-thought). If re-injected into history, a 10-turn conversation wastes 2,000+ tokens on reasoning the model never needs to see again. OllamaClient.extract_message() strips it before adding to session history.
Why not LangChain / LlamaIndex?
Both add abstraction layers that hide the finish_reason state machine and make it harder to inspect and fix broken message history. This framework exposes the raw OpenAI message format directly — you always know exactly what is being sent to the model.
Observed Qwen3:8b Behaviour
| Metric | Value |
|---|---|
| Tokens for simple Q&A | ~150 total (15 prompt, 135 completion incl. reasoning) |
| Tokens for tool call (42*7) | ~666 total across 2 turns |
| Tool call format | Standard OpenAI — finish_reason: "tool_calls", arguments as JSON string |
| Reasoning field | Always present, ~100–200 tokens, stripped by this library |
| Temperature=0 consistency | Deterministic across repeated runs |
Known Limitations
requires_confirmationflag inToolDefis defined but not yet wired into the loop.RedisSessionStorenot yet implemented — needed for multi-worker Flask (gunicorn with >1 worker).- No async support —
OllamaClientuseshttpx.Client(sync). For FastAPI or async Flask, replace withhttpx.AsyncClientand addasync/awaittoloop.py. BudgetGuard.max_tokens=32_000is conservative — Qwen3:8b context window is 128k. Raise this for long research tasks.
Running the Tests
Tests run against a live Ollama instance (not mocked — that is intentional).
pip install local-agent-core
ollama serve && ollama pull qwen3:8b
python -m pytest tests/ -v
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file local_agent_core-0.1.0.tar.gz.
File metadata
- Download URL: local_agent_core-0.1.0.tar.gz
- Upload date:
- Size: 20.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4114db9de272d39f3eb73d20c7ac000a3fa413bef5e52c276d46217f732882ba
|
|
| MD5 |
65e9ae8bb1bf8a395fa308dd1bb7c966
|
|
| BLAKE2b-256 |
8d5295412e849ce8b733af92b88ae1faeb9c470e37000d007066e32a8c4518df
|
Provenance
The following attestation bundles were made for local_agent_core-0.1.0.tar.gz:
Publisher:
publish.yml on chibokocl/local-agent-core
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
local_agent_core-0.1.0.tar.gz -
Subject digest:
4114db9de272d39f3eb73d20c7ac000a3fa413bef5e52c276d46217f732882ba - Sigstore transparency entry: 1573672088
- Sigstore integration time:
-
Permalink:
chibokocl/local-agent-core@343a5b6b1018c4756e3e46570320dddd3f4f6e84 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/chibokocl
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@343a5b6b1018c4756e3e46570320dddd3f4f6e84 -
Trigger Event:
push
-
Statement type:
File details
Details for the file local_agent_core-0.1.0-py3-none-any.whl.
File metadata
- Download URL: local_agent_core-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
87dc019f3e06fde4267b398feb1ea2bb049a348367bcffe71f217f34cedde3bb
|
|
| MD5 |
707fd5f87466a1bbb8ce4bcd47e7b365
|
|
| BLAKE2b-256 |
94cdbd495f6dafc3e81eebbaa960ad82435368cd738847d595c7f929d7b81a53
|
Provenance
The following attestation bundles were made for local_agent_core-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on chibokocl/local-agent-core
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
local_agent_core-0.1.0-py3-none-any.whl -
Subject digest:
87dc019f3e06fde4267b398feb1ea2bb049a348367bcffe71f217f34cedde3bb - Sigstore transparency entry: 1573672098
- Sigstore integration time:
-
Permalink:
chibokocl/local-agent-core@343a5b6b1018c4756e3e46570320dddd3f4f6e84 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/chibokocl
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@343a5b6b1018c4756e3e46570320dddd3f4f6e84 -
Trigger Event:
push
-
Statement type: