Self-improving agentic research swarm with local LLM inference
Project description
ollama-harness
Local-first agentic research pipeline. An LLM talks to the web, your filesystem, a browser, and itself — search, synthesize, evaluate, revise, remember. No cloud API required.
pip install ollama-harness
oh # interactive REPL
oh research "RL from human feedback"
oh /lit-review "RAG reranking" save to review.md
oh /design https://stripe.com save to stripe-design.md
What it does
A single oh command drives an agentic loop:
- Plan — identify what's known, what's missing, what queries to run
- Research — multi-round web search with novelty gating and URL enrichment
- Synthesize — produce a structured markdown document from the merged context
- Evaluate — Wiggum scores output across 6 dimensions (relevance, completeness, depth, groundedness, specificity, structure)
- Revise — if below threshold, the producer rewrites from evaluator feedback
- Remember — compress the run, store in ChromaDB, inject relevant observations into future runs
Skills extend the loop with specialised agents: browser navigation, literature review, YouTube transcription, design-system extraction, and multi-file HTML page generation.
Install
pip install ollama-harness
Or from source with uv:
git clone https://github.com/upskilled-consulting/ollama-harness
cd ollama-harness
uv sync
uv sync --extra gpu # CUDA torch
uv pip install -e . # register the `oh` entry point
Prerequisites
| Dependency | Purpose | Notes |
|---|---|---|
| Ollama | LLM inference (default) | ollama serve must be running |
| llama.cpp server | Alternative inference backend | Configure via HARNESS_ENDPOINTS |
| Node.js ≥ 18 | Dashboard UI | start.py builds it automatically; manual: cd dashboard && npm install && npm run build |
| Playwright | Browser skills | playwright install chromium |
| whisper.cpp | Audio transcription | Build binary, place at whisper.cpp/ |
Quick start
# Interactive REPL
oh
# One-shot task (no quotes needed)
oh research the latest work on speculative decoding
# Literature review
oh /lit-review "LLM calibration and uncertainty" save to calibration-review.md
# Browser navigation
oh /browser https://arxiv.org "find the most cited paper on RLHF this year"
# Transcribe a YouTube video
oh /transcribe https://youtube.com/watch?v=...
# Extract a design system from a live site
oh /design https://example.com save to design.md
# Generate a themed HTML page from .md content files
oh /build-page design.md from content/ save to index.html
# Full design-extract + page-build in one command
oh /site https://example.com from content/ save to index.html
# Generate a themed .pptx deck from a PDF paper
oh /deck --design https://example.com --content paper.pdf --out slides.pptx
# Deck from a URL content source with an existing design system
oh /deck --design brand.md --content https://example.com/article --out deck.pptx
# Deck from a folder of .md files styled to match a live site
oh /deck --design https://example.com --content ~/notes/ --title "Q2 Review" --out deck.pptx
Skills reference
| Command | Description |
|---|---|
research <topic> |
Multi-round web search + synthesis |
summarize <url|path> |
Fetch and compress a URL or local file |
/lit-review <topic> |
Fetch papers, annotate, synthesize into review |
/annotate <url|path> |
Annotate a paper or document (wiggum eval) |
/browser <url> <goal> |
LLM-guided web navigation + content extraction |
/sitemap <url> [goal] |
Crawl a domain, rank pages by goal |
/design <url> |
Extract design system tokens from a live URL |
/build-page <design.md> from <dir/> |
Generate themed HTML page from .md content files |
/site <url> from <dir/> |
Design extraction + page build in one command |
/deck --design <url|md> --content <url|dir|pdf> |
Generate a themed .pptx slide deck |
/transcribe <url|path> |
Transcribe YouTube video or local audio |
/recall <topic> |
Surface relevant observations from memory |
/introspect |
Generate a live capabilities doc from the skill registry |
/orientation |
Summarise project state + recent activity |
/re-orient |
Rebuild orientation cache from GitHub state |
/suggest |
Recommend next research tasks |
/debug [filter] |
Diagnose recent FAIL/ERROR runs |
/email <contact> <goal> |
Draft and send emails via Gmail |
/sync-wiki |
Sync lit-review corpus to GitHub wiki |
/panel |
Enable 3-persona wiggum review panel |
Flags
| Flag | Effect |
|---|---|
--no-wiggum |
Skip quality evaluation loop |
--headed |
Show browser window (browser/design tasks) |
--keep-browser |
Leave browser open after task |
--reuse-browser |
Reconnect to existing Chrome session |
Configuration
Copy .env.example to .env and edit:
cp .env.example .env
Key variables:
# Model endpoints — llamacpp / vllm / openai-compatible
HARNESS_ENDPOINTS='{"qwen3-8b": {"url": "http://localhost:8082/v1", "model_id": "qwen3-8b", "backend": "llamacpp"}, "qwen3.6-35b": {"url": "http://localhost:8083/v1", "model_id": "Qwen3.6-35B-A3B-UD-IQ3_S.gguf", "backend": "llamacpp"}}'
HARNESS_PRODUCER_MODEL=qwen3.6-35b
# Pure Ollama (default — no HARNESS_ENDPOINTS needed)
# Just run: ollama pull qwen3:8b
# Semantic Scholar API key (optional — increases rate limit)
S2_API_KEY=your_key_here
# Gmail (for /email skill)
SENDER_NAME=Your Name
SENDER_EMAIL=you@example.com
Multi-endpoint routing
HARNESS_ENDPOINTS maps a short tag to {url, model_id, backend}. Supported backends: llamacpp, vllm, openai. Models not listed fall through to Ollama. This lets you run a fast small model (8B) alongside a large one (35B) on separate ports and route to the right one per task.
Deck generation
/deck extracts a design system from any URL (or reads an existing .md design file), loads content from a URL, folder of .md files, or PDF (local or remote), and renders a fully themed .pptx using python-pptx.
oh /deck --design https://stripe.com --content research.pdf --out deck.pptx
oh /deck --design brand.md --content ~/notes/ --title "Q2 Review" --out deck.pptx
oh /deck --design https://notion.so --content https://example.com/paper.pdf
Content sources are auto-detected:
| Source | Handling |
|---|---|
https://... (web page) |
Playwright scrape → structured markdown |
https://....pdf |
MarkItDown converts directly from URL |
/path/to/file.pdf |
MarkItDown converts local PDF |
/path/to/folder/ |
All .md / .txt files in directory |
/path/to/file.md |
Single markdown file |
Slide types are inferred from markdown structure: # → title slide, ## → section divider, bullet lists → content slides (auto-split at 6 bullets), > blockquote → callout, markdown tables → table slides.
Page generation
/build-page uses a three-pass decomposed strategy that handles any number of content files without context overflow:
- Analysis — LLM reads title + abstract of every file, clusters by topic, assigns display roles (
featured/card/compact) - Shell — generates HTML structure (nav, hero, cluster sections) with
<!-- SECTION:filename.md -->placeholders - Sections — one LLM call per file, role-aware card HTML injected into the shell
Result: a complete, themed, clustered page regardless of how many files are in the content directory.
Dashboard
A React/TypeScript UI (Vite + Tanstack Query) provides live visibility into every run.
| View | Description |
|---|---|
| Dashboard | KPI cards (total runs, pass rate, avg score, token spend) + recent activity feed |
| Runs | Master-detail split: compact run list on the left, full DAG inspector on the right. Click any run to see the pipeline graph, per-stage token counts, output preview, Wiggum scores with dimension bars, evaluator feedback, and an RLHF thumbs-up/down panel per node. |
| Submit | Fire a task directly from the browser; result appears live in Runs. |
| Analytics | Time-series charts for run volume, pass rate, and token usage. |
| Sessions | Group runs by session for multi-turn task tracking. |
| Artifacts | Browse output files written by runs. |
| Fine-tune | Training metrics (loss, accuracy curves) and RL dataset browser — preference pairs, reward feedback, GRPO rollouts, and DPO examples with Wiggum evaluator annotations. |
| MCP | Inspect registered MCP tool servers. |
Two floating action buttons in the lower-right corner provide quick access without cluttering the sidebar:
- Terminal — a harness shell with
cdnavigation, command history (↑/↓),clear/help, and live run-status badges for any submitted task. - Voice — the voice input panel for hands-free task submission.
Starting the full stack
python start.py # starts inference servers, FastAPI, React dashboard
Or individually:
uvicorn harness.api.main:app --reload # API server (port 8000)
cd dashboard && npm run dev # React dashboard (port 5173)
whisper.cpp setup
The /transcribe skill uses the whisper.cpp binary for fast CPU/CUDA inference:
git clone https://github.com/ggerganov/whisper.cpp whisper.cpp
cd whisper.cpp
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release
./build/bin/whisper-cli --download-model base.en
Place the built directory at whisper.cpp/ in the repo root.
Development
uv sync --extra dev
pytest tests/
ruff check harness/
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ollama_harness-0.1.0.tar.gz.
File metadata
- Download URL: ollama_harness-0.1.0.tar.gz
- Upload date:
- Size: 270.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9064c3ce634d5649ab6b62c425628101184717f04476557cad43037fd0bf38d4
|
|
| MD5 |
8ee2792fcf1c3d7e9d415134d7d05e31
|
|
| BLAKE2b-256 |
df86c6cb265a5b099a7e6f8e20a56e4ef505aaa60d07647b274ee943bd4d501f
|
File details
Details for the file ollama_harness-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ollama_harness-0.1.0-py3-none-any.whl
- Upload date:
- Size: 283.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
662552297af5d5be49576995b869b3fc068d811767758463dbea589ce53063e2
|
|
| MD5 |
5dbade6163abd5a1f10e19c92dc58a37
|
|
| BLAKE2b-256 |
592c54a660f5015b9cd1a8d57c903bdaa7badacacddc1c78c010c880b38bf3b8
|