Self-improving agentic research swarm with local LLM inference

These details have not been verified by PyPI

Project links

Project description

ollama-harness

Local-first agentic research pipeline. An LLM talks to the web, your filesystem, a browser, and itself — search, synthesize, evaluate, revise, remember. No cloud API required.

pip install ollama-harness
oh                        # interactive REPL
oh research "RL from human feedback"
oh /lit-review "RAG reranking" save to review.md
oh /design https://stripe.com save to stripe-design.md

What it does

A single oh command drives an agentic loop:

Plan — identify what's known, what's missing, what queries to run
Research — multi-round web search with novelty gating and URL enrichment
Synthesize — produce a structured markdown document from the merged context
Evaluate — Wiggum scores output across 6 dimensions (relevance, completeness, depth, groundedness, specificity, structure)
Revise — if below threshold, the producer rewrites from evaluator feedback
Remember — compress the run, store in ChromaDB, inject relevant observations into future runs

Skills extend the loop with specialised agents: browser navigation, literature review, YouTube transcription, design-system extraction, and multi-file HTML page generation.

Install

pip install ollama-harness

Or from source with uv:

git clone https://github.com/upskilled-consulting/ollama-harness
cd ollama-harness
uv sync
uv sync --extra gpu   # CUDA torch
uv pip install -e .   # register the `oh` entry point

Prerequisites

Dependency	Purpose	Notes
Ollama	LLM inference (default)	`ollama serve` must be running
llama.cpp server	Alternative inference backend	Configure via `HARNESS_ENDPOINTS`
Node.js ≥ 18	Dashboard UI	`start.py` builds it automatically; manual: `cd dashboard && npm install && npm run build`
Playwright	Browser skills	`playwright install chromium`
whisper.cpp	Audio transcription	Build binary, place at `whisper.cpp/`

Quick start

# Interactive REPL
oh

# One-shot task (no quotes needed)
oh research the latest work on speculative decoding

# Literature review
oh /lit-review "LLM calibration and uncertainty" save to calibration-review.md

# Browser navigation
oh /browser https://arxiv.org "find the most cited paper on RLHF this year"

# Transcribe a YouTube video
oh /transcribe https://youtube.com/watch?v=...

# Extract a design system from a live site
oh /design https://example.com save to design.md

# Generate a themed HTML page from .md content files
oh /build-page design.md from content/ save to index.html

# Full design-extract + page-build in one command
oh /site https://example.com from content/ save to index.html

# Generate a themed .pptx deck from a PDF paper
oh /deck --design https://example.com --content paper.pdf --out slides.pptx

# Deck from a URL content source with an existing design system
oh /deck --design brand.md --content https://example.com/article --out deck.pptx

# Deck from a folder of .md files styled to match a live site
oh /deck --design https://example.com --content ~/notes/ --title "Q2 Review" --out deck.pptx

Skills reference

Command	Description
`research <topic>`	Multi-round web search + synthesis
`summarize <url\|path>`	Fetch and compress a URL or local file
`/lit-review <topic>`	Fetch papers, annotate, synthesize into review
`/annotate <url\|path>`	Annotate a paper or document (wiggum eval)
`/browser <url> <goal>`	LLM-guided web navigation + content extraction
`/sitemap <url> [goal]`	Crawl a domain, rank pages by goal
`/design <url>`	Extract design system tokens from a live URL
`/build-page <design.md> from <dir/>`	Generate themed HTML page from .md content files
`/site <url> from <dir/>`	Design extraction + page build in one command
`/deck --design <url\|md> --content <url\|dir\|pdf>`	Generate a themed .pptx slide deck
`/transcribe <url\|path>`	Transcribe YouTube video or local audio
`/recall <topic>`	Surface relevant observations from memory
`/introspect`	Generate a live capabilities doc from the skill registry
`/orientation`	Summarise project state + recent activity
`/re-orient`	Rebuild orientation cache from GitHub state
`/suggest`	Recommend next research tasks
`/debug [filter]`	Diagnose recent FAIL/ERROR runs
`/email <contact> <goal>`	Draft and send emails via Gmail
`/sync-wiki`	Sync lit-review corpus to GitHub wiki
`/panel`	Enable 3-persona wiggum review panel

Flags

Flag	Effect
`--no-wiggum`	Skip quality evaluation loop
`--headed`	Show browser window (browser/design tasks)
`--keep-browser`	Leave browser open after task
`--reuse-browser`	Reconnect to existing Chrome session

Configuration

Copy .env.example to .env and edit:

cp .env.example .env

Key variables:

# Model endpoints — llamacpp / vllm / openai-compatible
HARNESS_ENDPOINTS='{"qwen3-8b": {"url": "http://localhost:8082/v1", "model_id": "qwen3-8b", "backend": "llamacpp"}, "qwen3.6-35b": {"url": "http://localhost:8083/v1", "model_id": "Qwen3.6-35B-A3B-UD-IQ3_S.gguf", "backend": "llamacpp"}}'
HARNESS_PRODUCER_MODEL=qwen3.6-35b

# Pure Ollama (default — no HARNESS_ENDPOINTS needed)
# Just run: ollama pull qwen3:8b

# Semantic Scholar API key (optional — increases rate limit)
S2_API_KEY=your_key_here

# Gmail (for /email skill)
SENDER_NAME=Your Name
SENDER_EMAIL=you@example.com

Multi-endpoint routing

HARNESS_ENDPOINTS maps a short tag to {url, model_id, backend}. Supported backends: llamacpp, vllm, openai. Models not listed fall through to Ollama. This lets you run a fast small model (8B) alongside a large one (35B) on separate ports and route to the right one per task.

Deck generation

/deck extracts a design system from any URL (or reads an existing .md design file), loads content from a URL, folder of .md files, or PDF (local or remote), and renders a fully themed .pptx using python-pptx.

oh /deck --design https://stripe.com --content research.pdf --out deck.pptx
oh /deck --design brand.md --content ~/notes/ --title "Q2 Review" --out deck.pptx
oh /deck --design https://notion.so --content https://example.com/paper.pdf

Content sources are auto-detected:

Source	Handling
`https://...` (web page)	Playwright scrape → structured markdown
`https://....pdf`	MarkItDown converts directly from URL
`/path/to/file.pdf`	MarkItDown converts local PDF
`/path/to/folder/`	All `.md` / `.txt` files in directory
`/path/to/file.md`	Single markdown file

Slide types are inferred from markdown structure: # → title slide, ## → section divider, bullet lists → content slides (auto-split at 6 bullets), > blockquote → callout, markdown tables → table slides.

Page generation

/build-page uses a three-pass decomposed strategy that handles any number of content files without context overflow:

Analysis — LLM reads title + abstract of every file, clusters by topic, assigns display roles (featured / card / compact)
Shell — generates HTML structure (nav, hero, cluster sections) with  placeholders
Sections — one LLM call per file, role-aware card HTML injected into the shell

Result: a complete, themed, clustered page regardless of how many files are in the content directory.

Dashboard

A React/TypeScript UI (Vite + Tanstack Query) provides live visibility into every run.

View	Description
Dashboard	KPI cards (total runs, pass rate, avg score, token spend) + recent activity feed
Runs	Master-detail split: compact run list on the left, full DAG inspector on the right. Click any run to see the pipeline graph, per-stage token counts, output preview, Wiggum scores with dimension bars, evaluator feedback, and an RLHF thumbs-up/down panel per node.
Submit	Fire a task directly from the browser; result appears live in Runs.
Analytics	Time-series charts for run volume, pass rate, and token usage.
Sessions	Group runs by session for multi-turn task tracking.
Artifacts	Browse output files written by runs.
Fine-tune	Training metrics (loss, accuracy curves) and RL dataset browser — preference pairs, reward feedback, GRPO rollouts, and DPO examples with Wiggum evaluator annotations.
MCP	Inspect registered MCP tool servers.

Two floating action buttons in the lower-right corner provide quick access without cluttering the sidebar:

Terminal — a harness shell with cd navigation, command history (↑/↓), clear/help, and live run-status badges for any submitted task.
Voice — the voice input panel for hands-free task submission.

Starting the full stack

python start.py          # starts inference servers, FastAPI, React dashboard

Or individually:

uvicorn harness.api.main:app --reload    # API server (port 8000)
cd dashboard && npm run dev              # React dashboard (port 5173)

whisper.cpp setup

The /transcribe skill uses the whisper.cpp binary for fast CPU/CUDA inference:

git clone https://github.com/ggerganov/whisper.cpp whisper.cpp
cd whisper.cpp
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release
./build/bin/whisper-cli --download-model base.en

Place the built directory at whisper.cpp/ in the repo root.

Development

uv sync --extra dev
pytest tests/
ruff check harness/

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ollama_harness-0.1.0.tar.gz (270.7 kB view details)

Uploaded May 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ollama_harness-0.1.0-py3-none-any.whl (283.2 kB view details)

Uploaded May 4, 2026 Python 3

File details

Details for the file ollama_harness-0.1.0.tar.gz.

File metadata

Download URL: ollama_harness-0.1.0.tar.gz
Upload date: May 4, 2026
Size: 270.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for ollama_harness-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`9064c3ce634d5649ab6b62c425628101184717f04476557cad43037fd0bf38d4`
MD5	`8ee2792fcf1c3d7e9d415134d7d05e31`
BLAKE2b-256	`df86c6cb265a5b099a7e6f8e20a56e4ef505aaa60d07647b274ee943bd4d501f`

See more details on using hashes here.

File details

Details for the file ollama_harness-0.1.0-py3-none-any.whl.

File metadata

Download URL: ollama_harness-0.1.0-py3-none-any.whl
Upload date: May 4, 2026
Size: 283.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for ollama_harness-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`662552297af5d5be49576995b869b3fc068d811767758463dbea589ce53063e2`
MD5	`5dbade6163abd5a1f10e19c92dc58a37`
BLAKE2b-256	`592c54a660f5015b9cd1a8d57c903bdaa7badacacddc1c78c010c880b38bf3b8`

See more details on using hashes here.

ollama-harness 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ollama-harness

What it does

Install

Prerequisites

Quick start

Skills reference

Flags

Configuration

Multi-endpoint routing

Deck generation

Page generation

Dashboard

Starting the full stack

whisper.cpp setup

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes