Skip to main content

Local-first retrieval engine for technical documentation.

Project description

Cogwright: grounded, cited answers from your documents

PyPI CI

Cogwright answers questions about your technical documentation. Point it at a folder of manuals, specs, runbooks, or service bulletins, ask in plain language, and get a step-by-step answer grounded only in those documents, with citations to the source page. Bare identifiers resolve too: an alarm code, an error code, or a part number jumps straight to the passage that defines it.

It runs locally against a model endpoint you choose. Your documents and the index stay on your machine, and the only network calls go to that one endpoint.

uv sync

uv run cogwright ingest ./manuals \
  --base-url http://localhost:8000/v1 --embedding-model nomic-embed-text

uv run cogwright ask "How do I clear alarm 204?" \
  --base-url http://localhost:8000/v1 \
  --llm-model llama3.2:1b --embedding-model nomic-embed-text --min-score 0.55
To clear alarm 204, follow these steps:
1. Stop the unit and allow the gearbox to cool for ten minutes.
2. Check the coolant level and refill to the cold mark if it is low.
3. Clear alarm 204 from the panel and restart the unit.

Referenced identifiers: AL-204

Sources:
  - manuals/series7_conveyor_manual.txt (page 3, section: ALARM AND STOP CODE REFERENCE)

When the corpus does not contain the answer, Cogwright says so instead of inventing one.

How it works

Ingest parses each document into structure-aware chunks, where a table stays whole and a numbered procedure stays together, embeds them, and writes a single JSON index to disk. A dedicated pass indexes every alarm, stop, fault, error, diagnostic, warning, and part identifier as an exact lookup key.

Ask embeds the question and retrieves by two signals at once: semantic similarity and exact identifier lookup, with exact matches ranked above fuzzy ones. Those passages, and only those, go to the model with instructions to answer from them, number any steps, surface the identifiers, and cite each passage. If nothing clears the relevance bar, the model is never called.

ingest:  documents -> parse -> chunk -> embed ------+
                          \--> identifier index ----+--> index.json

ask:     question -> embed + detect identifiers -> hybrid retrieval
                  -> prompt (retrieved context only) -> model -> cited answer

Install

Cogwright targets Python 3.12. It is published on PyPI as cogwright-rag (the import package and the cogwright command keep the shorter name):

pip install cogwright-rag          # core install
pip install "cogwright-rag[ocr]"   # add scanned-page recognition

For development, use uv:

uv sync                  # core install
uv sync --extra ocr      # add scanned-page recognition

The only third-party runtime dependency is a PDF toolkit. The model client, vector math, and CLI are all standard library.

Configure the endpoint

Cogwright talks to any OpenAI-compatible endpoint, using the routes /v1/chat/completions and /v1/embeddings. That can be a model server on the same machine or a hosted API; the implementation is not tied to one provider, and it has been validated against a local server running a small chat model and an embedding model. Configure it with flags or environment variables:

Setting Flag Environment variable Default
Base URL --base-url COGWRIGHT_BASE_URL http://localhost:8000/v1
API key --api-key COGWRIGHT_API_KEY none
Chat model --llm-model COGWRIGHT_LLM_MODEL local-chat-model
Embedding model --embedding-model COGWRIGHT_EMBEDDING_MODEL local-embedding-model
Vision model --vision-model COGWRIGHT_VISION_MODEL local-vision-model
Index path --index COGWRIGHT_INDEX .cogwright/index.json

The model names are placeholders; set them to whatever your endpoint serves. An unreachable endpoint is reported with a non-zero exit, never a crash.

Commands

ingest

cogwright ingest <paths...> [--ocr] [--diagrams]

Paths are files or folders. --ocr recognizes scanned PDF pages and needs the ocr extra. --diagrams transcribes figure callouts with a multimodal model set by --vision-model. ingest builds a fresh index, recording which documents it holds, when it was built, and the embedding model that produced the vectors.

update, remove, info

Maintain an index without rebuilding it from scratch:

cogwright update <paths...>     # add new documents, refresh changed ones, skip unchanged
cogwright remove <paths...>     # drop documents, matched by path or file name
cogwright info                  # show the documents, counts, model, and timestamps

update compares a content hash per document, so re-running it only re-embeds what actually changed. Updating with a different embedding model than the index was built with is refused, and a query run with a mismatched model warns, because vectors from different models are not comparable.

ask

cogwright ask "<question>" [--top-k N] [--min-score S] [--no-stream] \
  [--show-retrieved] [--json]

The answer streams as it is generated. --show-retrieved prints the ranked passages and their scores first, which is how you see what retrieval is doing. --json asks the model for a structured reply (steps as a list, the passages it used named explicitly), which gives reliable numbered steps and precise citations with a capable model and falls back to the prose path otherwise.

eval

cogwright eval <dataset.json> [--min-score S]

Scores retrieval against a graded dataset without calling the chat model. See calibrating relevance.

serve

cogwright serve [--host 127.0.0.1] [--port 8765]

Runs a local web interface in the browser: a search box with streaming cited answers and a retrieval inspector, plus a corpus view for adding documents (by path or drag-and-drop) and removing them. It is a standard-library server with a bundled offline page, so it pulls in no web framework and serves only on the host you bind. Your documents and the index never leave the machine.

Documents it understands

  • Text and Markdown (.txt, .text, .md, .markdown). Headings, numbered steps, and pipe tables are recovered.
  • Born-digital PDFs with a real text layer, including tables, which are lifted out as structured blocks. Real page numbers are kept for citations.
  • Scanned PDF pages, with the ocr extra. A page with little text and a dominant image is rendered and recognized, then structured like any other page.
  • Diagram callouts, with --diagrams. A figure is sent to a vision model and the printed labels and part numbers become searchable.

Identifier lookup

A query that is a bare code resolves to the exact passage that documents it. The built-in patterns detect and normalize:

You type Resolves to
alarm 204, AL-204, AL204 AL-204
STOP CODE 12, SC-12 SC-12
fault 09 F-09
error 30 E-30
DTC P0420 DTC-P0420
warning 18 W-18
PN 44-19A, P/N 44-19A PN-44-19A

The patterns are configuration rather than hardcoded, so a deployment can add its own identifier schemes.

Calibrating relevance

The not-found decision rests on one cosine cutoff, --min-score. It is embedding-model dependent: different models place related and unrelated text at different similarity ranges, so there is no universal value (the default, 0.45, suits typical normalized models). Calibrate it with eval. A dataset pairs each question with the pages it should surface, the identifiers it should resolve, and whether it is answerable at all:

{ "question": "How do I clear alarm 204?", "expected_pages": [3],
  "expected_codes": ["AL-204"], "should_find": true }

Raise --min-score until the unanswerable cases report not-found while the real ones still resolve, then pass that value to both eval and ask. The harness reports found accuracy, page hit rate, code-resolution accuracy, and not-found accuracy, each with its counts.

Architecture

A pure core holds all retrieval and decision logic; a thin adapter and CLI layer does the real input and output. The core depends only on protocols, never on a concrete model, store, or framework:

  • core seams: FileSystem, DocumentParser, Embedder, LLMClient, VectorStore
  • ingestion seams: OcrEngine, DiagramAnalyzer

Adapters supply the real implementations (disk, text and PDF parsers, the HTTP client, an in-memory cosine store, an OCR engine, a vision analyzer); the tests supply fakes. To add a vector database, a different model API, or a new document type, implement the seam and the core does not change.

Testing

uv run pytest          # tests, including end-to-end ingest and ask on a sample manual
uv run mypy            # strict type checking
uv run ruff check .    # lint

Every seam has a deterministic fake, so the suite needs no live model. Tests for the real OCR engine run only where it is installed and skip otherwise. All three checks run in CI on every push and pull request, alongside a guard that fails on a missing license header.

Privacy

No telemetry and no required cloud service. The only outbound traffic is to the endpoint you configure, which can run on the same machine, so a fully air-gapped deployment is possible. Documents stay where you put them, and the index is a plain file you control.

License

MIT, with an SPDX header on every source file. Every dependency is permissively licensed; copyleft toolkits are avoided, including in the PDF path, to keep the tree MIT-compatible. See CONTRIBUTING.md to work on it.

Status

Text and born-digital PDF ingestion, hybrid retrieval, and grounded cited answers are complete and validated against a live local model. Scanned-page OCR, diagram transcription, and the evaluation harness have since landed. Region-level diagram cropping, tuning for low-quality scans, and more corpus formats are future work, and each fits an existing seam.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cogwright_rag-0.1.1.tar.gz (2.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cogwright_rag-0.1.1-py3-none-any.whl (64.6 kB view details)

Uploaded Python 3

File details

Details for the file cogwright_rag-0.1.1.tar.gz.

File metadata

  • Download URL: cogwright_rag-0.1.1.tar.gz
  • Upload date:
  • Size: 2.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.25 {"installer":{"name":"uv","version":"0.11.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cogwright_rag-0.1.1.tar.gz
Algorithm Hash digest
SHA256 91652de115796ed822125ae8de8129879a7ff8dec679351983e17a46024a83ba
MD5 c68d20c09b4d4724432c8fe19f5a6ead
BLAKE2b-256 4bb30ea373e6f4b34e773ac1f91916b94f08e48ce0d095604e4c6591e2296df4

See more details on using hashes here.

File details

Details for the file cogwright_rag-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: cogwright_rag-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 64.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.25 {"installer":{"name":"uv","version":"0.11.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cogwright_rag-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 13fe1a2436a3302504612707e2685cb9de4a00e9a31e038c5844076cbe3bb2b5
MD5 c1a624855bb8c2ee0132e5fc0bf8335b
BLAKE2b-256 a4761e72db0645e25e75414056e379d193f60b209cff62626d2786596ed4f87e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page