Skip to main content

Intelligent, honest knowledge retrieval in 5 minutes. No infrastructure. No boilerplate.

Project description

fitz-sage

The RAG library that says "I don't know" instead of hallucinating.

Python 3.10+ PyPI version License: MIT Version Coverage

Why Fitz?Retrieval IntelligenceGovernanceDocumentationGitHub



Q: "Who won the 2024 FIFA World Cup?"
(There was no World Cup in 2024.)
❌ Uncalibrated RAG systems
A: "Germany won the 2024 FIFA World Cup,
    defeating Argentina 1-0 in the final."
🛡️ fitz-sage
A: "I don't have enough information
    to answer this question.
    Related topics in the knowledge base:
      - FIFA tournament history (4 mentions)
      - 2022 World Cup coverage (7 mentions)
    To answer this, consider adding:
      - Documents covering 2024 FIFA events."
→ Uncalibrated RAG hallucinates confidently when the answer isn't in your documents.

Fitz refuses, explains why, and tells you what to add.


Where to start 🚀

[!IMPORTANT] Requires Ollama or a Cohere/OpenAI API key. Fitz auto-detects your setup on first run.

pip install fitz-sage

fitz query "What is our refund policy?" --source ./docs

That's it. Your documents are now searchable with AI.

fitz-sage quickstart demo Figure 1: Example of user experience for querying documents using fitz-sage.


About

Existing RAG tools hallucinate. When the answer isn't in your documents, they invent one — confidently, fluently, wrongly. In production, that's not a minor inconvenience. It's the reason you can't trust the system. I built fitz-sage to solve that problem directly, while working as a Data Engineer in the automotive industry. No LangChain. No LlamaIndex. Every layer written from scratch.

The retrieval architecture is KRAG (Knowledge Routing Augmented Generation) — documents are parsed into typed units (code symbols, sections, tables) and each query is routed to the right search strategy, rather than searching flat chunks uniformly.

Honesty is enforced by an ML governance classifier that decides when to answer, hedge, refuse — validated against fitz-gov, a purpose-built benchmark of 2,920 adversarial test cases (5.7% false-trustworthy rate).

It runs in production today and powers fitz-graveyard.

~55k lines of Python. 2,000+ tests. 99% coverage.

Yan Fitzner — (LinkedIn, GitHub).

fitz-sage honest_rag


📦 What is RAG?

RAG is how ChatGPT's "file search," Notion AI, and enterprise knowledge tools actually work under the hood. Instead of sending all your documents to an AI, RAG:

  1. Indexes your documents — Splits them into chunks, converts to vectors, stores in a database
  2. Retrieves only what's relevant — When you ask a question, finds the 5-10 most relevant chunks
  3. Sends just those chunks to the LLM — The AI answers based on focused, relevant context

Traditional approach:

  [All 10,000 documents] → LLM → Answer
  ❌ Impossible (too large)
  ❌ Expensive (if possible)
  ❌ Unfocused

RAG approach:

  Question → [Search index] → [5 relevant chunks] → LLM → Answer
  ✅ Works at any scale
  ✅ Costs pennies per query
  ✅ Focused context = better answers

📦 Why Can't I Just Send My Documents to ChatGPT directly?

You can—but you'll hit walls fast.

Context window limits 🚨

GPT-4 accepts ~128k tokens. That's roughly 300 pages. Your company wiki, codebase, or document archive is likely 10x-100x larger. You physically cannot paste it all.

Cost explosion 💥

Even if you could fit everything, you'd pay for every token on every query. Sending 100k tokens costs ~$1-3 per question. Ask 50 questions a day? That's $50-150 daily—for one user.

No selective retrieval ❌

When you paste documents, the model reads everything equally. It can't focus on what's relevant. Ask about refund policies and it's also processing your hiring guidelines, engineering specs, and meeting notes—wasting context and degrading answers.

No persistence 💢

Every conversation starts fresh. You re-upload, re-paste, re-explain. There's no knowledge base that accumulates and improves.


📦 How is this different from LangChain / LlamaIndex?

They're frameworks — you assemble the chunker, embedder, vector store, retriever, and prompt chain yourself. fitz-sage is a library — one function call that handles all of it with built-in intelligence.

You trade flexibility for a pipeline that handles temporal queries, comparison queries, code symbol extraction, tabular SQL, and epistemic honesty out of the box — without configuration.


Why Fitz?

Asymmetric indexing 🗂️KRAG (Knowledge Routing Augmented Generation)

Documents are parsed into typed retrieval units (symbols, sections, tables) with structural metadata, not flat chunks. Queries are routed to the right strategy per content type.

Zero-wait querying 🐆Progressive KRAG

Ask a question immediately — no ingestion step required. Fitz serves answers instantly via agentic search while a background worker indexes your files. Queries get faster over time as indexing completes, but they work from second one.

Honest answers ✅Governance Benchmark

Most RAG tools confidently answer even when the answer isn't in your documents. Ask "What was our Q4 revenue?" when your docs only cover Q1-Q3, and typical RAG hallucinates a number. Fitz says: *"I cannot find Q4 revenue figures in the provided documents."

→ Fitz detects when to abstain at 86.5% recall on fitz-gov, a 2,920 case benchmark for epistemic honesty (62.7% hard difficulty). False-trustworthy rate: 5.7%.

Actionable failures 🔍

When Fitz can't answer, it doesn't just refuse — it explains what it searched for, shows related topics that do exist, and suggests what documents to add. When sources conflict, Fitz tells you exactly which sources disagree and what the disagreement is about. Every failure mode is a feedback signal, not a dead end.

Queries that actually work 📊

Standard RAG fails silently on real queries. Fitz has built-in intelligence: hierarchical summaries for "What are the trends?", exact keyword matching for "Find TC-1000", multi-query decomposition for complex questions, address-based code retrieval with import graph traversal, and SQL execution for tabular data. No configuration—it just works.

Tabular data that is actually searchable 📈Unified Storage

CSV and table data is a nightmare in most RAG systems—chunked arbitrarily, structure lost, queries fail. Fitz stores tables natively in PostgreSQL alongside your vectors—same database, no sync issues. Auto-detects schema and runs real SQL. Ask "What's the average price by region?" and get an actual computed answer, not fragmented rows.

Fully local execution possible 🏠

Embedded PostgreSQL + Ollama/LM Studio. No API keys required to start.

[!TIP] Any questions left? Try fitz on itself:

fitz query "How does the retrieval pipeline work?" --source ./fitz_sage

The codebase speaks for itself.


What You Can Search

Traditional RAG chops every document into flat text blocks and searches them the same way. FitzKRAG parses each document by type — tree-sitter for code, heading hierarchy for docs, schema detection for CSVs — and produces typed retrieval units, each with its own storage format and search strategy.


Retrieval Unit Extracted From How It Works
Symbols 🖌️ Code files Tree-sitter parses functions, classes, and methods into addressable units with qualified names, references, and import graphs. Cross-file dependencies are graph traversals, not text searches.
Sections 📑 Documents (PDF, markdown, text) Headings and paragraphs are extracted with parent/child hierarchy. Deeply nested sections include parent context; top-level headings include child summaries.
Tables 📅 CSV files or tables within documents Native PostgreSQL storage with auto-detected schema. Real SQL execution from natural language — not chunked text.
Images 🖼️ Figures and diagrams within documents VLM-powered figure extraction and visual understanding. (Coming soon)
Chunks 🧩 Any content as fallback Traditional chunk-based retrieval when structured extraction doesn't apply. Automatic fallback — no configuration needed.

[!NOTE] All retrieval units share the same retrieval intelligence (temporal handling, comparison queries, multi-hop reasoning, etc.) and the same enrichment pipeline (summaries, keywords, entities, hierarchical summaries).


Retrieval Intelligence

Most RAG implementations are naive vector search—they fail silently on real-world queries. Fitz has built-in intelligence that handles edge cases automatically:


Feature Query Naive RAG Problem Fitz Solution
epistemic-honesty "What was our Q4 revenue?" ❌ Hallucinated number — Info doesn't exist, but LLM won't admit it ✅ "I don't know"
keyword-vocabulary "Find TC_1000" ❌ Wrong test case — Embeddings see TC_1000 ≈ TC_2000 (semantically similar) ✅ Exact keyword matching
hybrid-search "X100 battery specs" ❌ Returns Y200 docs — Semantic search misses exact model numbers ✅ Hybrid search (dense + sparse)
sparse-search "error code E_AUTH_401" ❌ No exact match — Embeddings miss precise error codes ✅ PostgreSQL full-text search
multi-hop "Who wrote the paper cited by the 2023 review?" ❌ Returns the review only — Single-step search can't traverse references ✅ Iterative retrieval
hierarchical-rag "What are the design principles?" ❌ Random fragments — Answer is spread across docs; no single chunk contains it ✅ Hierarchical summaries
multi-query [User pastes 500-char test report] "What failed and why?" ❌ Vaguely related chunks — Long input → averaged embedding → matches nothing specifically ✅ Multi-query decomposition
comparison-queries "Compare React vs Vue performance" ❌ Incomplete comparison — Only retrieves one entity, missing the other ✅ Multi-entity retrieval
entity-graph "What else mentions AuthService?" ❌ Isolated chunks — No awareness of shared entities across docs ✅ Entity-based linking across sources
temporal-queries "What changed between Q1 and Q2?" ❌ Random chunks — No awareness of time periods in query ✅ Temporal query handling
aggregation-queries "List all the test cases that failed" ❌ Partial list — No mechanism for comprehensive retrieval ✅ Aggregation query handling
freshness-authority "What does the official spec say?" ❌ Returns notes — Can't distinguish authoritative vs informal sources ✅ Freshness/authority boosting
query-expansion "How do I fetch the db config?" ❌ No matches — User says "fetch", docs say "retrieve"; "db" vs "database" ✅ Query expansion
query-rewriting "Tell me more about it" (after discussing TechCorp) ❌ Lost context — Pronouns like "it" reference nothing, retrieval fails ✅ Conversational context resolution
hyde "What's TechCorp's approach to sustainability?" ❌ Poor recall — Abstract queries don't embed close to concrete documents ✅ Hypothetical document generation
contextual-embeddings "When does it expire?" ❌ Ambiguous chunk — "It expires in 24h" embedded without context; "it" = ? ✅ Summary-prefixed symbol/section embeddings
reranking "What's the battery warranty?" ❌ Imprecise ranking — Vector similarity ≠ true relevance; best answer buried ✅ Cross-encoder precision

[!IMPORTANT] These features are always on—no configuration needed. Fitz automatically detects when to use each capability.


Governance — Know What You Don't Know

Feature docsfitz-gov benchmark

Most RAG systems hallucinate confidently. Fitz measures and enforces epistemic honesty using a 5-question cascade ML classifier trained on 2,920 labeled cases from fitz-gov, a benchmark for epistemic honesty.


  Query + Retrieved Context
            │
            ▼
  ┌─────────────────────┐
  │ 5 Constraints       │     Contradiction detection, evidence sufficiency,
  │ (epistemic sensors) │     causal attribution, answer verification, specific info type
  └──────────┬──────────┘
             │ 108 features extracted
             ▼
  ┌─────────────────────┐
  │ Q1: Evidence        │     Is the evidence sufficient?
  │ sufficient? (ML)    ├───► NO ──► ABSTAIN
  └──────────┬──────────┘
             │ YES
             ▼
  ┌─────────────────────┐
  │ Q2: Conflict?       │     Is there material conflict? (ML router)
  │ (ML)                ├───► YES ──┐
  └──────────┬──────────┘           │
             │ NO                   ▼
             │            ┌─────────────────────┐
             │            │ Q3: Conflict        │
             │            │ resolved? (ML)      ├───► NO ──► DISPUTED
             │            └──────────┬──────────┘
             │                       └ YES ────────────────► TRUSTWORTHY
             ▼
  ┌─────────────────────┐
  │ Q4: Evidence truly  │     Is the evidence solid enough?
  │ solid? (ML)         ├───► NO ──► ABSTAIN
  └──────────┬──────────┘
             └ YES ────────────────► TRUSTWORTHY


Decision Meaning Recall
ABSTAIN Evidence doesn't answer the question 86.5%
DISPUTED Sources contradict each other 86.1%
TRUSTWORTHY Consistent, sufficient evidence 70.0%

Overall accuracy: 78.7% | False-trustworthy: 5.7% on fitz-gov (2,920 cases, 5-fold cross-validated, 62.7% hard difficulty)


[!NOTE] Governance asks "given three relevant documents that partially contradict each other, should you flag a dispute, hedge the answer, or trust the consensus?" That's a judgment call even humans disagree on.

The system fails safe 🛡️

The safety-first threshold is tuned so that when the classifier is wrong, it over-hedges ("disputed" instead of "trustworthy") — annoying but harmless. Over-confidence ("trustworthy" instead of "disputed") is the rarest error mode.

These scores are a floor, not a ceiling 👣

All benchmarks were measured using qwen2.5:3b — a 3B parameter local model. The governance constraints run on the fast-tier LLM to keep latency low. Stronger models produce better constraint signals, which feed better features into the classifier. Upgrading your chat provider should improve governance accuracy for free.

Zero extra latency ⏱️

The constraints already run as part of the pipeline. The ML classifier just replaces hand-coded rules with a local sklearn model — inference takes microseconds, no additional API calls.


📦 Quick Start

CLI

pip install fitz-sage

fitz query "Your question here" --source ./docs

Fitz auto-detects your LLM provider:

  1. Ollama running? → Uses it automatically (fully local)
  2. COHERE_API_KEY or OPENAI_API_KEY set? → Uses it automatically
  3. First time? → Guides you through free Cohere signup (2 minutes)

After first run, it's completely zero-friction.


Python SDK

import fitz_sage

answer = fitz_sage.query("Your question here", source="./docs")

print(answer.text)
for source in answer.provenance:
   print(f"  - {source.source_id}: {source.excerpt[:50]}...")

The SDK provides:

  • Module-level query() matching CLI
  • Auto-config creation (no setup required)
  • Full provenance tracking
  • Same honest retrieval as the CLI

For advanced use (multiple collections), use the fitz class directly:

from fitz_sage import fitz

physics = fitz(collection="physics")
answer = physics.query("Explain entanglement", source="./physics_papers")

Fully Local (Ollama)

pip install fitz-sage[local]

ollama pull llama3.2
ollama pull nomic-embed-text

fitz query "Your question here" --source ./docs

Fitz auto-detects Ollama when running. No API keys needed—no data leaves your machine.


📦 Real-World Usage

Fitz is a foundation. It handles document indexing and grounded retrieval—you build whatever sits on top: chatbots, dashboards, alerts, or automation.


Chatbot Backend 🤖

Connect fitz to Slack, Discord, Teams, or your own UI. One function call returns an answer with sources—no hallucinations, full provenance. You handle the conversation flow; fitz handles the knowledge.

Example: A SaaS company plugs fitz into their support bot. Tier-1 questions like "How do I reset my password?" get instant answers. Their support team focuses on edge cases while fitz deflects 60% of incoming tickets.


Internal Knowledge Base 📖

Point fitz at your company's wiki, policies, and runbooks. Employees ask natural language questions instead of hunting through folders or pinging colleagues on Slack.

Example: A 200-person startup points fitz at their Notion workspace and compliance docs. New hires find answers to "How do I request PTO?" on day one—no more waiting for someone in HR to respond.


Continuous Intelligence & Alerting (Watchdog) 🐶

Pair fitz with cron, Airflow, or Lambda. Point at data on a schedule, run queries automatically, trigger alerts when conditions match. Fitz provides the retrieval primitive; you wire the automation.

Example: A security team points fitz at SIEM logs nightly. Every morning, a scheduled job asks "Were there failed logins from unusual locations?" If fitz finds evidence, an alert fires to the on-call channel before anyone checks email.


Web Knowledge Base 🌎

Scrape the web with Scrapy, BeautifulSoup, or Playwright. Save to disk, point fitz at it. The web becomes a queryable knowledge base.

Example: A football analytics hobbyist scrapes Premier League match reports. They point fitz at the folder and ask "How did Arsenal perform against top 6 teams?" or "What tactics did Liverpool use in away games?"—insights that would take hours to compile manually.


Codebase Search 🐍Code Symbol ExtractionKRAG

Two modes of code retrieval:

Full KRAG — tree-sitter parses your codebase into symbols (functions, classes, methods) with qualified names, references, and import graphs. No chunking—each symbol is a precise, addressable unit. Cross-file dependencies are tracked, so "what calls this function?" is a graph traversal, not a text search.

Standalone (pip install fitz-sage[code]) — Zero-dependency code retrieval via CodeRetriever. Builds an AST structural index, uses an LLM to select relevant files, expands via import graph and neighbor directories, and returns compressed results. No PostgreSQL, no pgvector, no docling—just point at a directory and ask.

Example: A team inherits a legacy Django monolith—200k lines, sparse docs. They point fitz at the codebase and ask "Where is user authentication handled?" or "What depends on the billing module?" FitzKRAG returns specific functions with their callers and dependencies. New developers onboard in days instead of weeks.


📦 ArchitectureFull Architecture Guide
┌───────────────────────────────────────────────────────────────┐
│                         fitz-sage                               │
├───────────────────────────────────────────────────────────────┤
│  User Interfaces                                              │
│  CLI: query (--source) | init | collections | config | serve  │
│  SDK: fitz_sage.query(source=...)                                │
│  API: /query | /chat | /collections | /health                 │
├───────────────────────────────────────────────────────────────┤
│  Engines                                                      │
│  ┌────────────┐  ┌────────────┐                               │
│  │  FitzKRAG  │  │  Custom... │  (extensible registry)        │
│  └────────────┘  └────────────┘                               │
├───────────────────────────────────────────────────────────────┤
│  LLM Providers (Python-based)                                  │
│  ┌────────┐ ┌───────────┐ ┌────────┐                          │
│  │  Chat  │ │ Embedding │ │ Rerank │                          │
│  └────────┘ └───────────┘ └────────┘                          │
│  openai, cohere, anthropic, ollama, lmstudio, azure...        │
├───────────────────────────────────────────────────────────────┤
│  Storage (PostgreSQL + pgvector)                              │
│  vectors | metadata | tables | keywords | full-text search    │
├───────────────────────────────────────────────────────────────┤
│  Retrieval (address-based, baked-in intelligence)             │
│  symbols | sections | tables | import graphs | reranking      │
├───────────────────────────────────────────────────────────────┤
│  Enrichment (baked in)                                        │
│  summaries | keywords | entities | hierarchical summaries     │
├───────────────────────────────────────────────────────────────┤
│  Constraints (epistemic safety)                               │
│  ConflictAware | InsufficientEvidence | CausalAttribution     │
└───────────────────────────────────────────────────────────────┘

📦 CLI ReferenceFull CLI Guide
fitz query "question" --source ./docs  # Point at docs and query (start here)
fitz query "question"                  # Query existing collection
fitz query --chat                      # Multi-turn conversation mode
fitz collections                       # List and delete knowledge collections
fitz serve                             # Start REST API server
fitz reset                             # Reset pgserver database (when stuck/corrupted)

Config: .fitz/config.yaml — auto-created on first run, edit to change models.


📦 Python SDK ReferenceFull SDK Guide

Simple usage (module-level, matches CLI):

import fitz_sage

answer = fitz_sage.query("What is the refund policy?", source="./docs")
print(answer.text)

Advanced usage (multiple collections):

from fitz_sage import fitz

# Create separate instances for different collections
physics = fitz(collection="physics")
legal = fitz(collection="legal")

# Query each collection
physics_answer = physics.query("Explain entanglement", source="./physics_papers")
legal_answer = legal.query("What are the payment terms?", source="./contracts")

Working with answers:

answer = fitz_sage.query("What is the refund policy?")

print(answer.text)
print(answer.mode)  # TRUSTWORTHY, DISPUTED, or ABSTAIN

for source in answer.provenance:
    print(f"Source: {source.source_id}")
    print(f"Excerpt: {source.excerpt}")

📦 REST API ReferenceFull API Guide

Start the server:

pip install fitz-sage[api]

fitz serve                    # localhost:8000
fitz serve -p 3000            # custom port
fitz serve --host 0.0.0.0     # all interfaces

Interactive docs: Visit http://localhost:8000/docs for Swagger UI.


Endpoints:

Method Endpoint Description
POST /query Query knowledge base
POST /chat Multi-turn chat (stateless)
GET /collections List all collections
GET /collections/{name} Get collection stats
DELETE /collections/{name} Delete a collection
GET /health Health check

Example request:

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"question": "What is the refund policy?", "collection": "default"}'

📦 FAQ / Troubleshooting

fitz command not found after install

Your Python Scripts directory isn't on PATH. Use python -m fitz_sage.cli.cli instead, or add the Scripts directory to your system PATH.

PDF/DOCX files are being skipped

Document parsing requires docling, which is optional to keep the base install lightweight. Install it with: pip install fitz-sage[docs]

"Cannot connect to Ollama" error

Ollama needs to be running as a background service. Start it with: ollama serve

"Model not found" error

The configured model isn't pulled in Ollama. Pull it with: ollama pull <model-name>. Check your config at .fitz/config.yaml to see which models are configured.

First query is slow

First run initializes the database and loads LLM models into memory. Subsequent queries are much faster. For Ollama, larger models take longer to load — use a smaller model like qwen3.5:0.6b for faster startup.

How do I change my LLM models?

Edit .fitz/config.yaml. The config uses provider/model format:

chat_fast: ollama/qwen3.5:0.6b
chat_smart: ollama/llama3.2
embedding: ollama/nomic-embed-text

How do I use a cloud provider instead of Ollama?

Set your API key and update the config:

export COHERE_API_KEY=your-key-here
chat_fast: cohere
chat_smart: cohere
embedding: cohere

How do I reset everything?

Delete the .fitz/ directory in your project root. Next run will re-detect and re-configure.


License

MIT


Links

Documentation:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fitz_sage-0.11.0.tar.gz (2.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fitz_sage-0.11.0-py3-none-any.whl (2.2 MB view details)

Uploaded Python 3

File details

Details for the file fitz_sage-0.11.0.tar.gz.

File metadata

  • Download URL: fitz_sage-0.11.0.tar.gz
  • Upload date:
  • Size: 2.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fitz_sage-0.11.0.tar.gz
Algorithm Hash digest
SHA256 2bebbdb94afc9bb06c3165eafa6bf10a58c0355a7f7be18bf5f97f27d65fca9f
MD5 efc8c05b8f2980d1347546367c277eca
BLAKE2b-256 74ad29ba91387539858d230de3610ae1f53c88ea0e8d5792d18bc45667d9f12f

See more details on using hashes here.

Provenance

The following attestation bundles were made for fitz_sage-0.11.0.tar.gz:

Publisher: release.yml on yafitzdev/fitz-sage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fitz_sage-0.11.0-py3-none-any.whl.

File metadata

  • Download URL: fitz_sage-0.11.0-py3-none-any.whl
  • Upload date:
  • Size: 2.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fitz_sage-0.11.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c93b64ade2fd5fc5c574a58de79519a4d54263d6667190b0de872856d4f8342c
MD5 857f7005b1611471dda570ae7452743f
BLAKE2b-256 4c5aed60a013f5f7cfb4b6cf530909acc5d6cf550c9f6792c28ee044e9b1da4f

See more details on using hashes here.

Provenance

The following attestation bundles were made for fitz_sage-0.11.0-py3-none-any.whl:

Publisher: release.yml on yafitzdev/fitz-sage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page