Standalone agentic framework for local LLMs via Ollama — reliable tool calling, session persistence, and loop guards

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

local-agent-core

A standalone Python framework that makes Qwen3-8B behave reliably as an agent when running via Ollama. It solves the three production problems you will hit within the first day of building a Qwen3 tool-calling loop:

Problem	Root Cause	Fix
Model gets stuck / loops forever	No termination conditions	`guards.py` — MaxTurns + Budget + Repetition guards
Tool call results not handled cleanly	No orphan injection on crash	`tool_result.py` — always injects a result, even on failure
No session persistence between queries	History reconstructed per request	`session_store.py` — `Session` holds history in-memory

Architecture is borrowed from Claude Code's QueryEngine.ts, query.ts, Task.ts, and Tool.ts.

Install

pip install local-agent-core

Prerequisites: Ollama must be running locally with Qwen3:8b pulled.

ollama serve          # terminal tab 1
ollama pull qwen3:8b  # first time only

Quick Start

from local_agent_core import AgentSession, GuardTripped

agent = AgentSession(system_prompt="You are a helpful assistant.")

@agent.tool(
    name="calculator",
    description="Evaluate a math expression",
    parameters={
        "type": "object",
        "properties": {"expression": {"type": "string"}},
        "required": ["expression"]
    }
)
def calculator(expression: str) -> str:
    return str(eval(expression, {"__builtins__": {}}))

response = agent.chat("What is 42 * 7?")   # → "The result is 294."
response2 = agent.chat("Double that.")      # history preserved automatically

Flask Integration

from flask import Flask, request, jsonify
from local_agent_core import AgentSession, GuardTripped, DiskSessionStore

app = Flask(__name__)
store = DiskSessionStore("/var/data/sessions")

SYSTEM_PROMPT = "You are a helpful assistant."

@app.route("/chat", methods=["POST"])
def chat():
    body = request.json
    session_id = body.get("session_id")   # None = new session
    message    = body["message"]

    agent = AgentSession(
        system_prompt=SYSTEM_PROMPT,
        session_id=session_id,
        store=store,
        max_turns=15,
    )
    _register_tools(agent)   # attach your tools here

    try:
        response = agent.chat(message)
        return jsonify({
            "response": response,
            "session_id": agent.session_id,
            "guards": agent.guard_summary(),
        })
    except GuardTripped as g:
        return jsonify({
            "response": f"I had to stop: {g.reason}",
            "session_id": agent.session_id,
            "guard_tripped": g.guard_name,
        })

API Reference

`AgentSession`

The only class you need to import in application code.

AgentSession(
    system_prompt           = "",               # Injected as role=system at turn 0
    model                   = "qwen3:8b",
    base_url                = "http://localhost:11434/v1",
    max_turns               = 20,               # Hard loop ceiling
    max_tokens_budget       = 32_000,           # Cumulative token ceiling per session
    max_tokens_per_response = 4096,             # Per-response output limit
    temperature             = 0.0,              # 0.0 = deterministic tool calling
    session_id              = None,             # Provide to resume existing session
    store                   = None,             # InMemorySessionStore by default
)

Method	Purpose
`agent.chat(message)`	Send message, get response. Mutates session history.
`@agent.tool(name, description, parameters)`	Register a tool via decorator
`agent.register_tool(ToolDef)`	Register a pre-built ToolDef
`agent.guard_summary()`	Returns dict with turns/tokens/tool_calls used
`agent.history()`	Full raw message list
`agent.session_id`	The session's UUID string
`agent.reset_guards()`	Reset guard counters without clearing history

Loop Guards

Three independent loop-breakers. Any one raising GuardTripped terminates the loop.

LoopGuards(
    max_turns            = 20,      # Raise after this many model responses
    max_tokens           = 32_000,  # Raise when cumulative tokens exceed this
    repetition_threshold = 3,       # Raise when same tool+args called this many times
)

Tuning guide:

Task type	Recommended `max_turns`
Simple Q&A	5
Single tool lookup	8
Multi-step analysis	15
Complex coding / research	30–50

Catching guard trips:

from local_agent_core import GuardTripped

try:
    response = agent.chat(user_input)
except GuardTripped as g:
    print(g.guard_name)   # "MaxTurns" | "Budget" | "Repetition"
    print(g.reason)       # human-readable explanation

Session Stores

from local_agent_core import InMemorySessionStore, DiskSessionStore

# In-memory (default) — fast, resets on process restart
store = InMemorySessionStore()

# Disk-based — survives restarts, good for single-server deployments
store = DiskSessionStore("/var/data/sessions")

For multi-worker production (gunicorn with >1 worker), implement RedisSessionStore with the same get(), create(), save() interface as DiskSessionStore.

Why temperature=0.0?

Deterministic output means the model makes the same tool call decision given the same history. With temperature > 0, a model might randomly decide not to use a tool, making the agent unreliable.

Why strip Qwen3 reasoning?

Qwen3:8b includes a reasoning field in every response (~150–300 tokens of chain-of-thought). If re-injected into history, a 10-turn conversation wastes 2,000+ tokens on reasoning the model never needs to see again. OllamaClient.extract_message() strips it before adding to session history.

Why not LangChain / LlamaIndex?

Both add abstraction layers that hide the finish_reason state machine and make it harder to inspect and fix broken message history. This framework exposes the raw OpenAI message format directly — you always know exactly what is being sent to the model.

Observed Qwen3:8b Behaviour

Metric	Value
Tokens for simple Q&A	~150 total (15 prompt, 135 completion incl. reasoning)
Tokens for tool call (42*7)	~666 total across 2 turns
Tool call format	Standard OpenAI — `finish_reason: "tool_calls"`, `arguments` as JSON string
Reasoning field	Always present, ~100–200 tokens, stripped by this library
Temperature=0 consistency	Deterministic across repeated runs

Known Limitations

requires_confirmation flag in ToolDef is defined but not yet wired into the loop.
RedisSessionStore not yet implemented — needed for multi-worker Flask (gunicorn with >1 worker).
No async support — OllamaClient uses httpx.Client (sync). For FastAPI or async Flask, replace with httpx.AsyncClient and add async/await to loop.py.
BudgetGuard.max_tokens=32_000 is conservative — Qwen3:8b context window is 128k. Raise this for long research tasks.

Running the Tests

Tests run against a live Ollama instance (not mocked — that is intentional).

pip install local-agent-core
ollama serve && ollama pull qwen3:8b
python -m pytest tests/ -v

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

clement_chiboko

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

May 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

local_agent_core-0.1.0.tar.gz (20.1 kB view details)

Uploaded May 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

local_agent_core-0.1.0-py3-none-any.whl (19.9 kB view details)

Uploaded May 19, 2026 Python 3

File details

Details for the file local_agent_core-0.1.0.tar.gz.

File metadata

Download URL: local_agent_core-0.1.0.tar.gz
Upload date: May 19, 2026
Size: 20.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for local_agent_core-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`4114db9de272d39f3eb73d20c7ac000a3fa413bef5e52c276d46217f732882ba`
MD5	`65e9ae8bb1bf8a395fa308dd1bb7c966`
BLAKE2b-256	`8d5295412e849ce8b733af92b88ae1faeb9c470e37000d007066e32a8c4518df`

See more details on using hashes here.

Provenance

The following attestation bundles were made for local_agent_core-0.1.0.tar.gz:

Publisher: publish.yml on chibokocl/local-agent-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: local_agent_core-0.1.0.tar.gz
- Subject digest: 4114db9de272d39f3eb73d20c7ac000a3fa413bef5e52c276d46217f732882ba
- Sigstore transparency entry: 1573672088
- Sigstore integration time: May 19, 2026
Source repository:
- Permalink: chibokocl/local-agent-core@343a5b6b1018c4756e3e46570320dddd3f4f6e84
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/chibokocl
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@343a5b6b1018c4756e3e46570320dddd3f4f6e84
- Trigger Event: push

File details

Details for the file local_agent_core-0.1.0-py3-none-any.whl.

File metadata

Download URL: local_agent_core-0.1.0-py3-none-any.whl
Upload date: May 19, 2026
Size: 19.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for local_agent_core-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`87dc019f3e06fde4267b398feb1ea2bb049a348367bcffe71f217f34cedde3bb`
MD5	`707fd5f87466a1bbb8ce4bcd47e7b365`
BLAKE2b-256	`94cdbd495f6dafc3e81eebbaa960ad82435368cd738847d595c7f929d7b81a53`

See more details on using hashes here.

Provenance

The following attestation bundles were made for local_agent_core-0.1.0-py3-none-any.whl:

Publisher: publish.yml on chibokocl/local-agent-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: local_agent_core-0.1.0-py3-none-any.whl
- Subject digest: 87dc019f3e06fde4267b398feb1ea2bb049a348367bcffe71f217f34cedde3bb
- Sigstore transparency entry: 1573672098
- Sigstore integration time: May 19, 2026
Source repository:
- Permalink: chibokocl/local-agent-core@343a5b6b1018c4756e3e46570320dddd3f4f6e84
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/chibokocl
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@343a5b6b1018c4756e3e46570320dddd3f4f6e84
- Trigger Event: push

local-agent-core 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

local-agent-core

Install

Quick Start

Flask Integration

API Reference

AgentSession

Loop Guards

Session Stores

Why temperature=0.0?

Why strip Qwen3 reasoning?

Why not LangChain / LlamaIndex?

Observed Qwen3:8b Behaviour

Known Limitations

Running the Tests

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`AgentSession`