Skip to main content

Modular knowledge retrieval suite: code, docs, APIs — all on-premise

Project description

IntelligenceSuite

Retrieve enterprise knowledge in seconds, not hours.

PyPI version Python 3.10+ License: MIT Tests

A modular RAG suite for enterprise on-premise environments.
Index your codebase and company documents; query them in natural language with precise source citations.

Zero mandatory cloud. Zero lock-in. Fully on-premise by default.

code + docs  →  parse  →  chunks  →  embed  →  ChromaDB  →  REST API  →  natural-language answers

⚡ Quick Start

Prerequisites

# 1. Ollama (local inference — no GPU required for embedding)
#    Download from https://ollama.com, then:
ollama serve
ollama pull nomic-embed-text      # embedding model
ollama pull qwen2.5-coder:7b      # generation model (or any other)

Option A — Launcher (recommended)

The easiest way: one command starts a dashboard that manages all three modules.

pip install intelligence-suite

is-launch          # opens http://localhost:8079 in your browser

From the launcher page each card has two controls:

  • Offline▶ Start starts the server in the background + Apri → opens the chat UI (use Start first, wait a few seconds for the dot to turn green, then Apri →)
  • OnlineApri → opens the chat UI directly + stops the server

Header ▶ Avvia tutto starts all three servers at once. No extra terminals needed.

First time? You still need to index your content before starting the servers:

ci-parse /path/to/repo && ci-embed   # CodeIntelligence (one-time)
di-ingest /path/to/docs && di-embed  # DocIntelligence  (one-time)
mi-ingest ./practices                # MentorIntelligence (one-time)

Option B — Individual servers

ci-parse /path/to/your/repo       # parse → chunks.jsonl        (seconds)
ci-embed                           # embed → ChromaDB            (one-time, ~20-40 min CPU)
ci-serve                           # REST API + Chat UI → http://localhost:8080

ci-embed is slow the first time (every chunk is sent to the embedding model). ChromaDB persists the result to ~/.intelligence_suite/chroma (absolute path — safe to run from any working directory). Subsequent server restarts are instant.

Chat UI — open your browser

Once any server is running, open its URL — the chat interface loads instantly.

  • Responses stream word by word in real-time (SSE)
  • Multi-conversation sidebar — New Chat button, full history per module
  • Conversations persist across page refreshes (localStorage per module)
  • Date-grouped list: Today / Yesterday / This week / Older
  • Source citations as chips below each answer (file · type · score)
  • Server status, chunk count, LLM backend displayed live
  • Zero extra dependencies — served directly from the RAG server

Or query via REST API

curl -X POST http://localhost:8080/api/v1/query \
  -H "Content-Type: application/json" \
  -d '{"question": "Where is authentication handled?"}'
{
  "answer": "Authentication is handled in auth/jwt.py — the verify_token function ...",
  "sources": [{"source": "auth/jwt.py", "type": "function", "score": 0.91}],
  "confidence": 0.91,
  "escalated": false,
  "backend": "ollama",
  "latency_ms": 312.4
}

No cloud, no API key, no GPU required for the default setup.


Jupyter Notebook example

Three cells to go from raw repository to natural-language answers.

Cell 1 — Parse & embed (run once)

from pathlib import Path
from CodeIntelligence.parse_repo import parse_repo
from CodeIntelligence.embed_chunks import embed_chunks

REPO = Path("/path/to/your/repo")

chunks = parse_repo(REPO, output=Path("chunks.jsonl"))
print(f"✅ {len(chunks)} chunks extracted")

embed_chunks(Path("chunks.jsonl"))   # → ChromaDB local, no server needed
print("✅ Indexed into ChromaDB")
Parsed: 631 chunks, 23 files without parser
✅ 631 chunks extracted
Embedding 631/631 chunks...
  batch 1: 32 chunks embedded
  ...
  batch 20: 32 chunks embedded
JSONL saved: chunks.jsonl
ChromaDB 'code_intelligence': 631 total chunks indexed
✅ Indexed into ChromaDB

Cell 2 — Semantic search

from intelligence_core.retriever import Retriever

retriever = Retriever.load_default(collection_name="code_intelligence")

QUESTION = "How does the retriever work?"
results = retriever.search(QUESTION, domain="code", top_k=5)

for r in results:
    print(f"[{r.rank}] score={r.score:.3f} | {r.chunk['source']} ({r.chunk['type']})")
[1] score=0.823 | intelligence_core/retriever.py (function)
[2] score=0.791 | README.md (file)
[3] score=0.764 | intelligence_core/store.py (function)
[4] score=0.741 | examples/01_code_intelligence.py (file)
[5] score=0.718 | intelligence_core/embedder.py (function)

Cell 3 — LLM answer

from intelligence_core.llm import get_llm_provider

llm     = get_llm_provider()   # reads LLM_BACKEND from .env — default: ollama
context = "\n\n---\n\n".join(r.chunk["text"] for r in results[:3])
answer  = llm.generate(QUESTION, context)

print(f"💬 Answer ({llm.backend_name}):\n{answer}")
💬 Answer (ollama):

The retriever works by loading a default collection named "code_intelligence"
and then searching for documents related to the query within the "code" domain.
It retrieves the top 5 most relevant results using semantic similarity combined
with a keyword boost (+0.1 per matching term, capped at +0.3), then re-ranks
by final score. Sources are cited with file path, chunk type, and similarity score.

Tested on a standard laptop (CPU only, no GPU). Cell 1 is a one-time operation — subsequent queries (Cell 2 + 3) return in 1–5 seconds.

DocIntelligence notebook example

Cell 1 — Ingest & embed documents (run once)

from pathlib import Path
from DocIntelligence.ingest_docs import ingest_docs
from DocIntelligence.embed_docs import embed_docs

DOCS = Path("/path/to/your/docs")   # PDF, DOCX, XLSX, TXT, MD

chunks = ingest_docs(DOCS, output=Path("doc_chunks.jsonl"))
print(f"✅ {len(chunks)} chunks ingested")

embed_docs(Path("doc_chunks.jsonl"))   # → ChromaDB "doc_intelligence"
print("✅ Indexed into ChromaDB")

Cell 2 — Search & answer

from intelligence_core.retriever import Retriever
from intelligence_core.llm import get_llm_provider

retriever = Retriever.load_default(collection_name="doc_intelligence")
llm       = get_llm_provider()

QUESTION  = "What are the production deploy prerequisites?"
results   = retriever.search(QUESTION, domain="doc", top_k=5)
context   = "\n\n---\n\n".join(r.chunk["text"] for r in results[:3])
answer    = llm.generate(QUESTION, context)

print(f"💬 Answer ({llm.backend_name}):\n{answer}")
print("\n📎 Sources:")
for r in results:
    print(f"  [{r.rank}] score={r.score:.3f} | {r.chunk['source']} ({r.chunk['type']})")

MentorIntelligence notebook example

Cell 1 — Ingest best practices (run once)

from pathlib import Path
from MentorIntelligence.content.ingest_practices import ingest_practices

PRACTICES = Path("./practices")   # folder with .md / .txt team conventions
PRACTICES.mkdir(exist_ok=True)

# Drop any .md files with team conventions, naming guides, runbooks
chunks = ingest_practices(PRACTICES)
print(f"✅ {len(chunks)} practice chunks indexed into ChromaDB 'mentor_intelligence'")

Cell 2 — Start an onboarding session

import httpx, json

# Requires: mi-serve running on http://localhost:8082
BASE = "http://localhost:8082"

# Create session
resp = httpx.post(f"{BASE}/api/v1/mentor/onboard", json={
    "user_name": "Alice",
    "intro": "I am a senior Python developer, joining the team today."
}, timeout=30)
session = resp.json()
print(f"Session: {session['session_id']}")
print(f"Profile: {session['profile']}")
print(f"\nOnboarding path:\n{session['path']}")

Cell 3 — Ask within the onboarding path

resp = httpx.post(f"{BASE}/api/v1/mentor/ask", json={
    "session_id": session["session_id"],
    "question":   "How does authentication work in this codebase?"
}, timeout=60)

answer = resp.json()
print(f"💬 Answer:\n{answer['answer']}")
print(f"\n📎 Sources: {[s['source'] for s in answer['sources'][:3]]}")

The problem it solves

How much time does your team lose every week hunting down where a function is implemented, re-reading a 40-page procedure to recall one detail, or asking colleagues what an undocumented service actually does?

Every AI assistant — local LLMs, Copilots, RAG agents — reasons on the context it receives. Raw file dumps waste tokens on boilerplate and miss structure.

IntelligenceSuite turns your source code and company documents into domain-aware semantic chunks — each self-contained, source-cited, and immediately embeddable — then serves them through a local REST API you can query from any client.


Modules

Module Domain Status Description
intelligence_core Shared layer ✅ Stable Chunk schema, embedder, ChromaDB store, retriever, escalation policy
CodeIntelligence Source code ✅ Stable Python AST + regex parsers for TS, Go, YAML, SQL, MD
DocIntelligence Company docs ✅ Stable PDF (3-level), DOCX, XLSX, TXT ingest pipeline
MentorIntelligence Onboarding ✅ Stable Adaptive onboarding — profile detection, sessions, cross-domain path

Installation

# Minimal — Ollama for both embeddings and generation (fully local)
pip install intelligence-suite

# With document parsers
pip install "intelligence-suite[pdf,docx,xlsx]"

# With OpenAI / vLLM / Groq / Mistral generation
pip install "intelligence-suite[openai]"

# With Claude generation + Voyage embeddings
pip install "intelligence-suite[claude]"

# With OCR support (requires tesseract on the system)
pip install "intelligence-suite[pdf,ocr]"

# Everything
pip install "intelligence-suite[all]"

# Development
pip install -e ".[dev]"

CodeIntelligence — source code RAG

Parses your repository into semantic chunks, embeds them locally, and exposes a REST endpoint to query your codebase in natural language.

Supported languages

Language Parser Extracts
Python AST-based (precise) modules, classes, methods, functions, decorators, async
TypeScript / JS Regex modules, classes, named + arrow functions
Go Regex packages, functions, method receivers, structs, interfaces
SQL Regex CREATE TABLE / VIEW / FUNCTION / PROCEDURE / INDEX
YAML Heuristic Docker Compose services, GitHub Actions jobs, K8s manifests
Markdown Heading-based H1 / H2 / H3 sections

CLI quickstart

# ── Step 1: index (run once, from any directory) ──────────────────────────
ci-parse /path/to/repo               # → chunks.jsonl           (seconds)
ci-embed                              # → ~/.intelligence_suite/chroma  (slow, one-time)

# Next time the code changes, only re-embed new chunks:
ci-embed --incremental

# ── Step 2: serve (instant — data already in ChromaDB) ───────────────────
ci-serve                              # http://localhost:8080

# ── Step 3: use the chat UI or REST API ──────────────────────────────────
# Open http://localhost:8080 in your browser  ← streaming chat UI
# Suggestion pills adapt to the module (Code / Doc / Mentor) automatically.

# Or query via curl:
curl -X POST http://localhost:8080/api/v1/query \
  -H "Content-Type: application/json" \
  -d '{"question": "Where is authentication handled?"}'

ChromaDB data is stored in ~/.intelligence_suite/chroma (absolute path, resolved at startup). You can run commands from any directory — the data will always be found. Override with CHROMA_PERSIST_DIR=/custom/path in .env if needed.

Python API

from pathlib import Path
from CodeIntelligence.parse_repo import parse_repo
from CodeIntelligence.embed_chunks import embed_chunks
from intelligence_core.retriever import Retriever

# 1. Parse and embed (one-time)
chunks = parse_repo(Path("/path/to/repo"), output=Path("chunks.jsonl"))
embed_chunks(Path("chunks.jsonl"))   # → ChromaDB "code_intelligence"

# 2. Query
retriever = Retriever.load_default(collection_name="code_intelligence")
results = retriever.search("Where is authentication handled?", domain="code", top_k=5)
for hit in results:
    print(f"[{hit.chunk['source']}] score={hit.score:.2f}  {hit.chunk['text'][:120]}")

DocIntelligence — document RAG

Ingests company documents across multiple formats with a 3-level PDF parsing strategy (structured → OCR → raw binary) and serves them through the same retrieval interface.

Supported formats

Format Parser Notes
PDF pdfplumber → pytesseract → raw binary 3-level fallback; heading detection via y-coordinates
DOCX python-docx Heading + body sections; empty sections preserved
XLSX openpyxl Sheet-by-sheet tabular chunks
TXT / MD Built-in Line-based or heading-based split

CLI quickstart

# ── Step 1: index (run once, from any directory) ──────────────────────────
di-ingest /path/to/docs              # → doc_chunks.jsonl        (seconds)
di-embed                              # → ~/.intelligence_suite/chroma  (slow, one-time)
di-embed --incremental               # re-index only new files

# ── Step 2: serve (instant) ───────────────────────────────────────────────
di-serve                              # http://localhost:8081

# Open http://localhost:8081 for the chat UI, or query via curl:
curl -X POST http://localhost:8081/api/v1/query \
  -H "Content-Type: application/json" \
  -d '{"question": "What are the production deploy prerequisites?"}'

Python API

from pathlib import Path
from DocIntelligence.ingest_docs import ingest_docs
from DocIntelligence.embed_docs import embed_docs
from intelligence_core.retriever import Retriever

# 1. Ingest and embed (one-time)
chunks = ingest_docs(Path("/path/to/docs"), output=Path("doc_chunks.jsonl"))
embed_docs(Path("doc_chunks.jsonl"))   # → ChromaDB "doc_intelligence"

# 2. Query
retriever = Retriever.load_default(collection_name="doc_intelligence")
results = retriever.search("Production deploy prerequisites", domain="doc", top_k=5)
for hit in results:
    print(f"[{hit.chunk['source']}] score={hit.score:.2f}  {hit.chunk['text'][:200]}")

MentorIntelligence — adaptive onboarding

Builds a personalised onboarding path for each newcomer, combining knowledge from both the codebase and company documents via cross-domain retrieval.

Capabilities

Feature Description
Profile detection Infers seniority and specialisation from the intro message
Session management Persistent onboarding sessions with full question history
Path builder Generates a structured learning path from profile + company practices
Cross-domain orchestrator Retrieves from code, doc, and mentor domains in one query
Practice ingestion Ingest team conventions, naming guides, and runbooks as mentor knowledge

CLI quickstart

# Ingest your team's best practices (Markdown / TXT / YAML)
# The repo ships with a ready-to-use practices/ folder for IntelligenceSuite itself:
mi-ingest ./practices

# Start the mentor server (default: http://localhost:8082)
mi-serve

# Start an onboarding session
curl -X POST http://localhost:8082/api/v1/mentor/onboard \
  -H "Content-Type: application/json" \
  -d '{"user_name": "Alice", "intro": "I am a senior Python developer, first day here."}'

# Ask within your onboarding path
curl -X POST http://localhost:8082/api/v1/mentor/ask \
  -H "Content-Type: application/json" \
  -d '{"session_id": "...", "question": "How does authentication work in this codebase?"}'

Bundled practices/ folder

The repository ships with four ready-to-use Markdown guides covering IntelligenceSuite itself — ideal for teams adopting the suite:

File Content
01_onboarding_nuovo_developer.md Day-by-day setup, first indexing, team conventions
02_come_usare_code_intelligence.md CI pipeline, CLI commands, API, troubleshooting
03_come_usare_doc_intelligence.md DI supported formats, pipeline, confidence notes
04_come_usare_mentor_intelligence.md MI onboarding flow, profile detection, REST API

Each file is split by ## headings at ingest time — one chunk per section — for precise retrieval. Running mi-ingest ./practices produces ~30 chunks.


What a chunk looks like

Every source file and every document is converted into self-contained semantic chunks with a unified schema:

{
  "id":         "code::function::myrepo/auth/jwt.py::verify_token",
  "domain":     "code",
  "type":       "function",
  "text":       "### verify_token\n\n**Description:** Validates a JWT and returns the decoded payload.\n```python\ndef verify_token(token: str) -> dict:\n    ...\n```",
  "source":     "auth/jwt.py",
  "language":   "python",
  "metadata": {
    "symbol":     "verify_token",
    "start_line": 42,
    "end_line":   67,
    "decorators": ["@router.get"],
    "calls":      ["jwt.decode", "raise_for_status"]
  },
  "embedding":  [0.012, -0.034, "..."],
  "indexed_at": "2025-05-01T10:22:00Z",
  "checksum":   "9ff7ac4fe71b"
}

Valid domains and types

Domain Types
code module, class, function, method, config, schema
doc section, table, paragraph
mentor practice, path, session
api endpoint, schema
data table, view

Integration examples

Ollama — fully local, zero cost

# .env: LLM_BACKEND=ollama  OLLAMA_MODEL=qwen2.5-coder:7b
from intelligence_core.retriever import Retriever
from intelligence_core.llm import get_llm_provider

retriever = Retriever.load_default(collection_name="code_intelligence")
llm       = get_llm_provider()          # reads LLM_BACKEND from .env

hits    = retriever.search("How is the database connection pooled?", domain="code", top_k=5)
context = "\n\n".join(h.chunk["text"] for h in hits)
answer  = llm.generate("How is the database connection pooled?", context)

print(answer)
for h in hits:
    print(f"  [{h.chunk['source']}] score={h.score:.2f}")

OpenAI / Groq / Mistral / any OpenAI-compatible API

# .env: LLM_BACKEND=openai  OPENAI_API_KEY=sk-...  OPENAI_MODEL=gpt-4o
from intelligence_core.llm import get_llm_provider
from intelligence_core.retriever import Retriever

retriever = Retriever.load_default(collection_name="doc_intelligence")
llm       = get_llm_provider()          # OpenAICompatProvider
# For Groq:      set OPENAI_BASE_URL=https://api.groq.com/openai/v1
# For Mistral:   set OPENAI_BASE_URL=https://api.mistral.ai/v1

hits   = retriever.search("Explain the payment flow", domain="doc", top_k=8)
answer = llm.generate("Explain the payment flow", "\n\n".join(h.chunk["text"] for h in hits))

vLLM — local GPU server

# .env: LLM_BACKEND=vllm
#        OPENAI_BASE_URL=http://localhost:8000/v1
#        OPENAI_MODEL=mistralai/Mistral-7B-Instruct-v0.2
from intelligence_core.llm import get_llm_provider

llm = get_llm_provider("vllm")   # OpenAI-compat client → your vLLM server

Claude API

# .env: LLM_BACKEND=claude  ANTHROPIC_API_KEY=sk-ant-...  CLAUDE_MODEL=claude-opus-4-5
from intelligence_core.llm import get_llm_provider
from intelligence_core.retriever import Retriever

retriever = Retriever.load_default(collection_name="code_intelligence")
llm       = get_llm_provider("claude")

hits   = retriever.search("Explain the entire auth flow", domain="code", top_k=8)
answer = llm.generate("Explain the entire auth flow", "\n\n".join(h.chunk["text"] for h in hits))

LangChain / LlamaIndex

from intelligence_core.retriever import Retriever
from langchain.schema import Document

retriever = Retriever.load_default(collection_name="doc_intelligence")
hits = retriever.search("Deploy prerequisites", domain="doc", top_k=10)

docs = [
    Document(
        page_content=h.chunk["text"],
        metadata={
            "source": h.chunk["source"],
            "domain": h.chunk["domain"],
            "type":   h.chunk["type"],
        },
    )
    for h in hits
]
# → pass docs to any LangChain chain or LlamaIndex index

LLM backends (generation)

IntelligenceSuite uses a provider-agnostic LLMProvider protocol for answer generation. Switch backend with a single env var — no code changes required.

Backend LLM_BACKEND Extra Notes
Ollama ollama (none) Default — fully local, no API key, no GPU required
OpenAI openai [openai] GPT-4o, GPT-4o-mini, o1, …
vLLM vllm [openai] Local GPU server, OpenAI-compatible; set OPENAI_BASE_URL
Claude claude [claude] Anthropic claude-opus-4-5, claude-sonnet-4-5, …
Groq openai [openai] Fast inference; set OPENAI_BASE_URL=https://api.groq.com/openai/v1
Mistral AI openai [openai] Set OPENAI_BASE_URL=https://api.mistral.ai/v1
LM Studio vllm [openai] Local; set OPENAI_BASE_URL=http://localhost:1234/v1

Any OpenAI-compatible server works with LLM_BACKEND=openai or vllm by pointing OPENAI_BASE_URL at the correct endpoint.

Per-module LLM routing

Each module can use a different LLM backend and model independently. Set any combination of CI_LLM_*, DI_LLM_*, MI_LLM_* in .env — leave empty to fall back to the global LLM_BACKEND:

# CodeIntelligence → vLLM GPU server with code-specialised model
CI_LLM_BACKEND=openai
CI_LLM_MODEL=codellama:34b
CI_LLM_BASE_URL=http://gpu-server:8000/v1

# DocIntelligence → local Mistral (better multilingual / Italian)
DI_LLM_BACKEND=ollama
DI_LLM_MODEL=mistral:7b

# MentorIntelligence → Claude API (best pedagogical quality)
MI_LLM_BACKEND=claude
MI_LLM_MODEL=claude-sonnet-4-5

Any OpenAI-compatible endpoint (vLLM, Groq, Mistral AI, LM Studio, Azure…) works by setting *_LLM_BACKEND=openai and *_LLM_BASE_URL to the endpoint.

Escalation

When retrieval confidence < ESCALATION_THRESHOLD and ANTHROPIC_API_KEY is set, the system automatically escalates to Claude — regardless of the primary LLM_BACKEND.

Embedding backends

Backend EMBED_BACKEND Extra Notes
Ollama ollama (none) Default — fully local
SentenceTransformer st [st] CPU-only, fully offline, no Ollama needed
Claude / Voyage claude [claude] Cloud embeddings via Voyage AI

Multilingual support

By default the embedding model is English-optimised. To query and answer in Italian (or any of 50+ languages), switch to a multilingual SentenceTransformer model:

# .env
EMBED_BACKEND=st
ST_MODEL=paraphrase-multilingual-MiniLM-L12-v2   # 50+ languages, same speed as default
# ST_MODEL=paraphrase-multilingual-mpnet-base-v2  # higher quality, 768-dim
pip install "intelligence-suite[st]"

Then re-run ci-embed (or di-embed) to rebuild the index with multilingual embeddings. The LLM will automatically respond in the language of the question — no extra configuration needed.

Model Languages Dimensions Speed
all-MiniLM-L6-v2 English only 384 ⚡ Fast (default)
paraphrase-multilingual-MiniLM-L12-v2 50+ (IT, FR, ES, DE, …) 384 ⚡ Fast
paraphrase-multilingual-mpnet-base-v2 50+ 768 🐢 Slower, higher quality

Note: switching embedding model requires re-indexing from scratch — the vector dimensions may change (384 → 768) and ChromaDB will reject mixed-dimension collections. Delete the data directory before re-running ci-embed with a new model:

# Linux / macOS
rm -rf ~/.intelligence_suite/chroma
# Windows (PowerShell)
Remove-Item -Recurse -Force "$HOME\.intelligence_suite\chroma"

Vector store

Store Status Notes
ChromaDB ✅ Default Embedded — runs inside the Python process, persists to ~/.intelligence_suite/chroma
pgvector 🔶 v0.2 Enterprise, multi-tenant, PostgreSQL-native
Neo4j (Graph) 🔶 v0.3 Code call graph, import graph, doc cross-references — hybrid retrieval

ChromaDB runs embedded — no separate server or Docker container needed.
Data is persisted to ~/.intelligence_suite/chroma automatically and survives restarts.
Override the path with CHROMA_PERSIST_DIR=/your/path in .env.


Design principles

Principle Implementation
On-premise first Ollama + ChromaDB by default — no cloud required
Domain-aware chunking Every chunk carries domain — prevents cross-contamination in retrieval
Deterministic IDs domain::type::locator — safe to re-index, dedup-friendly
3-level PDF parsing pdfplumber → OCR → raw binary — never silently drops a page
Fail-safe ingestion One broken file never crashes the pipeline
Fail-loud embedding OllamaEmbedder raises immediately if unreachable — never stores zero vectors
Graceful escalation Stays local until similarity drops below threshold, then escalates to Claude API
CORS-enabled API All three FastAPI servers include CORSMiddleware — embeddable in any dashboard
Modular Each module is independently installable and deployable

Configuration

Copy .env.example to .env and edit:

# LLM generation (ollama | openai | vllm | claude)
LLM_BACKEND=ollama
OLLAMA_MODEL=qwen2.5-coder:7b

# OpenAI-compatible (OpenAI, vLLM, Groq, Mistral, LM Studio…)
# OPENAI_API_KEY=sk-...
# OPENAI_MODEL=gpt-4o
# OPENAI_BASE_URL=https://api.openai.com/v1   # vLLM: http://localhost:8000/v1

# Claude
# ANTHROPIC_API_KEY=sk-ant-...
# CLAUDE_MODEL=claude-opus-4-5

# Embeddings (ollama | st | claude)
EMBED_BACKEND=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_EMBED_MODEL=nomic-embed-text

# Vector store
VECTOR_STORE=chromadb
# Default path: ~/.intelligence_suite/chroma  (absolute — CWD-independent)
# Uncomment to override:
# CHROMA_PERSIST_DIR=/custom/path/chroma

# Escalation: fallback to Claude when confidence < threshold
ESCALATION_THRESHOLD=0.70

# Server ports — all three can run simultaneously
CI_PORT=8080   # CodeIntelligence
DI_PORT=8081   # DocIntelligence
MI_PORT=8082   # MentorIntelligence

All variables are accepted as plain environment variables too — no .env file required in CI/CD.

CLI reference

Command Module Action
ci-parse <repo> CodeIntelligence Parse repository into chunks → chunks.jsonl
ci-embed [file] CodeIntelligence Embed chunks into ChromaDB (default: chunks.jsonl)
ci-serve CodeIntelligence Start the code RAG server on CI_PORT (default 8080)
di-ingest <dir> DocIntelligence Ingest documents into chunks → doc_chunks.jsonl
di-embed [file] DocIntelligence Embed doc chunks into ChromaDB (default: doc_chunks.jsonl)
di-serve DocIntelligence Start the doc RAG server on DI_PORT (default 8081)
mi-ingest <dir> MentorIntelligence Ingest best practice documents
mi-serve MentorIntelligence Start the mentor server on MI_PORT (default 8082)
is-launch Launcher Dashboard to start/stop/monitor all modules — port 8079

KPI targets

Metric CodeIntelligence DocIntelligence
Hit@1 > 60% > 55%
Hit@5 > 85% > 80%
MRR > 0.70 > 0.65
Latency P50 < 300 ms < 400 ms
Latency P99 < 2 000 ms < 2 000 ms

KPI tests are included in the suite and skipped automatically until a live indexed store is present.


Hardware requirements

Scenario Hardware
Dev / local testing Mac or PC, 16 GB RAM
Team 1–10 Linux server, 32 GB RAM
Team 10–50 (GPU) RTX 3090/4090 + 64 GB RAM
Team 50+ pgvector (roadmap) + dedicated GPU

Troubleshooting

ci-serve / di-serve / mi-serve not found after install

On Windows, pip installs CLI scripts in a user-level Scripts folder that may not be on PATH. You will see this warning during pip install:

WARNING: The scripts ci-serve.exe, ci-parse.exe ... are installed in
'C:\Users\<you>\AppData\Roaming\Python\Python3xx\Scripts'
which is not on PATH.

Fix — add the Scripts folder to your PATH (run once in PowerShell):

$scripts = "$env:APPDATA\Python\$(python -c 'import sys; print(f\"Python{sys.version_info.major}{sys.version_info.minor}\")')\Scripts"
$current = [Environment]::GetEnvironmentVariable("PATH", "User")
if ($current -notlike "*$scripts*") {
    [Environment]::SetEnvironmentVariable("PATH", "$current;$scripts", "User")
    Write-Host "PATH updated — reopen your terminal."
}

Then reopen your terminal and ci-serve will work.

Quick fix without reopening (current session only):

$env:PATH += ";$env:APPDATA\Python\$(python -c 'import sys; print(f\"Python{sys.version_info.major}{sys.version_info.minor}\")')\Scripts"

Alternative — run without modifying PATH:

python -m CodeIntelligence.rag_server   # instead of ci-serve
python -m DocIntelligence.doc_server    # instead of di-serve
python -m MentorIntelligence.mentor_server  # instead of mi-serve

Ollama not reachable during embedding

If ci-embed (or di-embed) raises a RuntimeError like:

OllamaEmbedder: cannot reach http://localhost:11434 (model=nomic-embed-text).
  Fix 1: ollama serve && ollama pull nomic-embed-text
  Fix 2: set EMBED_BACKEND=st in .env (offline, no server required)

Ollama is not running or the model is not pulled. The embedder fails loudly (no silent zero-vector storage) so you always know immediately when there is a problem.

Fix 1 — Start Ollama:

ollama serve                       # start Ollama
ollama pull nomic-embed-text       # pull the embedding model if not present

Fix 2 — Switch to the CPU-only offline embedder (no Ollama needed):

pip install "intelligence-suite[st]"
# set in .env:
EMBED_BACKEND=st

ChromaDB DuplicateIDError on embed

If ci-embed raises DuplicateIDError, your chunks.jsonl contains duplicate chunk IDs. This can happen if you ran python -m build inside the repo before indexing — the build/lib/ directory gets indexed alongside the real sources.

# Clean build artefacts and the ChromaDB data directory, then re-index
rm -rf build/ dist/
rm -rf ~/.intelligence_suite/chroma          # Linux / macOS
# Remove-Item -Recurse -Force "$HOME\.intelligence_suite\chroma"  # Windows PowerShell

ci-parse /path/to/repo
ci-embed

From version 0.1.2 onwards, parse_repo automatically excludes build/, dist/, venv/, and other non-source directories.


Test suite

pip install -e ".[dev]"
pytest tests/ -v
# 54 passed, 5 skipped (KPI — require indexed store), 0 failed

Architecture

IntelligenceSuite/
├── intelligence_core/       # Shared: chunk schema, embedder, ChromaDB, retriever, escalation
├── CodeIntelligence/        # Code RAG: Python AST, TS, Go, YAML, SQL, MD parsers
├── DocIntelligence/         # Doc RAG: PDF (3-level), DOCX, XLSX, TXT
└── MentorIntelligence/      # Adaptive onboarding: profile, session, path, orchestrator

Roadmap

POC → Production evolution

Version Milestone Enterprise target
0.2.x Launcher dashboard · multi-conversation sidebar · per-module LLM routing · multilingual embeddings · streaming chat UI · absolute ChromaDB path Current — POC ready
0.3.0 pgvector · multi-tenant namespacing · JWT auth · Docker Compose Teams 1–50, shared infra
0.4.0 Graph layer (Neo4j) · hybrid vector+graph retrieval · async embedding queue Code dependency traversal, multi-hop reasoning
0.5.0 vLLM GPU serving · OpenTelemetry tracing · Prometheus metrics Teams 50+, GPU cluster
0.6.0 GitHub/GitLab webhook · incremental re-index · WebSocket push Real-time knowledge base
1.0.0 Kubernetes · horizontal scaling · SLA-tested · full observability Production enterprise

Why graph in v0.3?

Vector search answers "what is similar to my query?"
Graph traversal answers "what calls this function? what depends on this module? what documents reference this procedure?"

parse_repo already extracts calls, imports, and decorators for every chunk — the foundation for a full code dependency graph is already in place. Combined with vector similarity (GraphRAG pattern), this unlocks multi-hop reasoning that pure vector search cannot achieve.


License

MIT — see LICENSE


See ARCHITECTURE.md for design decisions and docs/ for presentations.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

intelligence_suite-0.2.22-py3-none-any.whl (89.1 kB view details)

Uploaded Python 3

File details

Details for the file intelligence_suite-0.2.22-py3-none-any.whl.

File metadata

File hashes

Hashes for intelligence_suite-0.2.22-py3-none-any.whl
Algorithm Hash digest
SHA256 a0aeafa10ba670c7d6c14cdd769990e0d46ed256f24872c908c2f0bf05891600
MD5 912648d68990d0980e9d9a0e37439770
BLAKE2b-256 2275928516d2aaac983232080f66bbf02562cd663ebd11a3d5aa357fd1cb6b31

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page