cogwright-rag

Local-first retrieval engine for technical documentation.

These details have not been verified by PyPI

Project links

Project description

Cogwright: grounded, cited answers from your documents

Cogwright answers questions about your technical documentation. Point it at a folder of manuals, specs, runbooks, or service bulletins, ask in plain language, and get a step-by-step answer grounded only in those documents, with citations to the source page. Bare identifiers resolve too: an alarm code, an error code, or a part number jumps straight to the passage that defines it.

It runs locally against an OpenAI-compatible model endpoint you choose, so your documents and the index stay on your machine and the only network calls go to that one endpoint.

Install it, point --base-url at your model server (local or hosted), and set the model names to whatever it serves:

pip install cogwright-rag

cogwright ingest ./manuals \
  --base-url http://localhost:8000/v1 --embedding-model nomic-embed-text

cogwright ask "How do I clear alarm 204?" \
  --base-url http://localhost:8000/v1 \
  --llm-model llama3.2:1b --embedding-model nomic-embed-text --min-score 0.55

To clear alarm 204, follow these steps:
1. Stop the unit and allow the gearbox to cool for ten minutes.
2. Check the coolant level and refill to the cold mark if it is low.
3. Clear alarm 204 from the panel and restart the unit.

Referenced identifiers: AL-204

Sources:
  - manuals/series7_conveyor_manual.txt (page 3, section: ALARM AND STOP CODE REFERENCE)

When the corpus does not contain the answer, Cogwright says so instead of inventing one. Prefer a browser? cogwright serve opens the same answers in a local web interface, with a corpus manager for adding and removing documents.

How it works

Ingest parses each document into structure-aware chunks, where a table stays whole and a numbered procedure stays together, embeds them, and writes a single JSON index to disk. A dedicated pass indexes every alarm, stop, fault, error, diagnostic, warning, and part identifier as an exact lookup key.

Ask embeds the question and retrieves by two signals at once: semantic similarity and exact identifier lookup, with exact matches ranked above fuzzy ones. Those passages, and only those, go to the model with instructions to answer from them, number any steps, surface the identifiers, and cite each passage. If nothing clears the relevance bar, the model is never called.

ingest:  documents -> parse -> chunk -> embed ------+
                          \--> identifier index ----+--> index.json

ask:     question -> embed + detect identifiers -> hybrid retrieval
                  -> prompt (retrieved context only) -> model -> cited answer

Install

Cogwright targets Python 3.12. It is published on PyPI as cogwright-rag (the import package and the cogwright command keep the shorter name):

pip install cogwright-rag          # core install
pip install "cogwright-rag[ocr]"   # add scanned-page recognition

For development, use uv:

uv sync                  # core install
uv sync --extra ocr      # add scanned-page recognition

The only third-party runtime dependency is a PDF toolkit. The model client, vector math, and CLI are all standard library.

Configure the endpoint

Cogwright talks to any OpenAI-compatible endpoint, using the routes /v1/chat/completions and /v1/embeddings. That can be a model server on the same machine or a hosted API; the implementation is not tied to one provider, and it has been validated against a local server running a small chat model and an embedding model. Configure it with flags or environment variables:

Setting	Flag	Environment variable	Default
Base URL	`--base-url`	`COGWRIGHT_BASE_URL`	`http://localhost:8000/v1`
API key	`--api-key`	`COGWRIGHT_API_KEY`	none
Chat model	`--llm-model`	`COGWRIGHT_LLM_MODEL`	`local-chat-model`
Embedding model	`--embedding-model`	`COGWRIGHT_EMBEDDING_MODEL`	`local-embedding-model`
Vision model	`--vision-model`	`COGWRIGHT_VISION_MODEL`	`local-vision-model`
Index path	`--index`	`COGWRIGHT_INDEX`	`.cogwright/index.json`

The model names are placeholders; set them to whatever your endpoint serves. An unreachable endpoint is reported with a non-zero exit, never a crash.

Commands

ingest

cogwright ingest <paths...> [--ocr] [--diagrams]

Paths are files or folders. --ocr recognizes scanned PDF pages and needs the ocr extra. --diagrams transcribes figure callouts with a multimodal model set by --vision-model. ingest builds a fresh index, recording which documents it holds, when it was built, and the embedding model that produced the vectors.

update, remove, info

Maintain an index without rebuilding it from scratch:

cogwright update <paths...>     # add new documents, refresh changed ones, skip unchanged
cogwright remove <paths...>     # drop documents, matched by path or file name
cogwright info                  # show the documents, counts, model, and timestamps

update compares a content hash per document, so re-running it only re-embeds what actually changed. Updating with a different embedding model than the index was built with is refused, and a query run with a mismatched model warns, because vectors from different models are not comparable.

ask

cogwright ask "<question>" [--top-k N] [--min-score S] [--no-stream] \
  [--show-retrieved] [--json]

The answer streams as it is generated. --show-retrieved prints the ranked passages and their scores first, which is how you see what retrieval is doing. --json asks the model for a structured reply (steps as a list, the passages it used named explicitly), which gives reliable numbered steps and precise citations with a capable model and falls back to the prose path otherwise.

eval

cogwright eval <dataset.json> [--min-score S]

Scores retrieval against a graded dataset without calling the chat model. See calibrating relevance.

serve

cogwright serve [--host 127.0.0.1] [--port 8765]

Runs a local web interface in the browser: a search box with streaming cited answers and a retrieval inspector, a corpus view for adding documents (by path or drag-and-drop) and removing them, and a settings view that switches the chat model and the relevance cutoff live and re-embeds the corpus to a new embedding model (streaming progress, since that regenerates every vector). It is a standard-library server with a bundled offline page, so it pulls in no web framework and serves only on the host you bind. Your documents and the index never leave the machine.

Documents it understands

Text and Markdown (.txt, .text, .md, .markdown). Headings, numbered steps, and pipe tables are recovered.
Born-digital PDFs with a real text layer, including tables, which are lifted out as structured blocks. Real page numbers are kept for citations.
Scanned PDF pages, with the ocr extra. A page with little text and a dominant image is rendered and recognized, then structured like any other page.
Diagram callouts, with --diagrams. A figure is sent to a vision model and the printed labels and part numbers become searchable.

Identifier lookup

A query that is a bare code resolves to the exact passage that documents it. The built-in patterns detect and normalize:

You type	Resolves to
`alarm 204`, `AL-204`, `AL204`	`AL-204`
`STOP CODE 12`, `SC-12`	`SC-12`
`fault 09`	`F-09`
`error 30`	`E-30`
`DTC P0420`	`DTC-P0420`
`warning 18`	`W-18`
`PN 44-19A`, `P/N 44-19A`	`PN-44-19A`

The patterns are configuration rather than hardcoded, so a deployment can add its own identifier schemes.

Calibrating relevance

The not-found decision rests on one cosine cutoff, --min-score. It is embedding-model dependent: different models place related and unrelated text at different similarity ranges, so there is no universal value (the default, 0.45, suits typical normalized models). Calibrate it with eval. A dataset pairs each question with the pages it should surface, the identifiers it should resolve, and whether it is answerable at all:

{ "question": "How do I clear alarm 204?", "expected_pages": [3],
  "expected_codes": ["AL-204"], "should_find": true }

Raise --min-score until the unanswerable cases report not-found while the real ones still resolve, then pass that value to both eval and ask. The harness reports found accuracy, page hit rate, code-resolution accuracy, and not-found accuracy, each with its counts.

Architecture

A pure core holds all retrieval and decision logic; a thin adapter and CLI layer does the real input and output. The core depends only on protocols, never on a concrete model, store, or framework:

core seams: FileSystem, DocumentParser, Embedder, LLMClient, VectorStore
ingestion seams: OcrEngine, DiagramAnalyzer

Adapters supply the real implementations (disk, text and PDF parsers, the HTTP client, an in-memory cosine store, an OCR engine, a vision analyzer); the tests supply fakes. To add a vector database, a different model API, or a new document type, implement the seam and the core does not change.

Testing

uv run pytest          # tests, including end-to-end ingest and ask on a sample manual
uv run mypy            # strict type checking
uv run ruff check .    # lint

Every seam has a deterministic fake, so the suite needs no live model. Tests for the real OCR engine run only where it is installed and skip otherwise. All three checks run in CI on every push and pull request, alongside a guard that fails on a missing license header.

Privacy

No telemetry and no required cloud service. The only outbound traffic is to the endpoint you configure, which can run on the same machine, so a fully air-gapped deployment is possible. Documents stay where you put them, and the index is a plain file you control.

License

MIT, with an SPDX header on every source file. Every dependency is permissively licensed; copyleft toolkits are avoided, including in the PDF path, to keep the tree MIT-compatible. See CONTRIBUTING.md to work on it.

Status

Released and validated against a live local model: text and born-digital PDF ingestion, hybrid retrieval with identifier lookup, grounded cited answers, the structured-JSON answer mode, incremental index updates, the evaluation harness, and the local web interface. Scanned-page OCR and diagram transcription are available behind their seams. Region-level diagram cropping, tuning for low-quality scans, and more corpus formats are future work, each fitting an existing seam.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Jun 29, 2026

0.1.1

Jun 29, 2026

0.1.0

Jun 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cogwright_rag-0.2.0.tar.gz (3.1 MB view details)

Uploaded Jun 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cogwright_rag-0.2.0-py3-none-any.whl (67.3 kB view details)

Uploaded Jun 29, 2026 Python 3

File details

Details for the file cogwright_rag-0.2.0.tar.gz.

File metadata

Download URL: cogwright_rag-0.2.0.tar.gz
Upload date: Jun 29, 2026
Size: 3.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.25 {"installer":{"name":"uv","version":"0.11.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cogwright_rag-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`96ed2c90ae95ae4d47e974b355f27774f0cb874a526b43d419cd037e21824649`
MD5	`61427a2823360ac945267e9457f5a51c`
BLAKE2b-256	`0605f377e9824d5e1a035296b239af87acc46f3db408deab5e964448a7365d6e`

See more details on using hashes here.

File details

Details for the file cogwright_rag-0.2.0-py3-none-any.whl.

File metadata

Download URL: cogwright_rag-0.2.0-py3-none-any.whl
Upload date: Jun 29, 2026
Size: 67.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.25 {"installer":{"name":"uv","version":"0.11.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cogwright_rag-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7b4c982e1a53af176724697990fa8e4f147310f40b238e3a461341936fca6f18`
MD5	`cf3ac50a0d82e42750434d42f712c8b9`
BLAKE2b-256	`45c477a88f07b346231c786165f37171e64edbf8bea816f4c2d539c31988d28b`

See more details on using hashes here.

cogwright-rag 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

How it works

Install

Configure the endpoint

Commands

ingest

update, remove, info

ask

eval

serve

Documents it understands

Identifier lookup

Calibrating relevance

Architecture

Testing

Privacy

License

Status

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes