ZIM-based retrieval augmented proxy for OpenAI-compatible AI

These details have not been verified by PyPI

Project links

Project description

Tensor (Serve)

tensor-serve is a ZIM-based retrieval augmented proxy for any OpenAI-compatible AI. This program lets you download ZIM documentation from the live Kiwix OPDS catalog, builds a local semantic vector database from it, and uses that database to provide an AI model relevant context when answering questions.

The purpose of this program is to provide the service for customizing your AI for your specific needs seamlessly.

Combining keyword search and semantic search, Tensor helps produce more accurate responses for the data you have included in a ZIM database.

1. How the AI pipeline works

Download — ZIM files fetched from Kiwix and stored in the configured ZIM source folder (zim_files/ by default)
Ingest — Articles extracted, HTML stripped, split into 500-word overlapping chunks, embedded with sentence-transformers, indexed in FAISS and BM25
Auto-load — On server startup, the last active collection's FAISS and BM25 indexes are loaded automatically
Analyze — Simple queries can skip retrieval; domain-specific queries use the query analyzer to choose the best search mode (hybrid, faiss, or bm25); time-sensitive queries optionally trigger web search
OpenAI-compatible proxy — For /v1/chat/completions, the user message is embedded (or served from cache) → hybrid search retrieves top-k chunks (optionally merged with web results) → optional cross-encoder reranking improves result order → retrieved context is injected into the request before it is forwarded to the upstream AI server.

Hybrid search (FAISS + BM25 + optional Web Search w/ Reciprocal Rank Fusion)

Search requests and OpenAI-compatible chat requests can run up to three retrievals in parallel and merge them:

	FAISS (semantic)	BM25 (keyword)	Web Search
Finds	Conceptually related chunks	Exact term / token matches	Current / recent information
Good for	"How does backpressure work?"	"asyncio.gather", error codes, API names	"latest news", "today's events", time-sensitive queries
Requires setup	Automatic	Automatic	Optional; disabled by default

Results are merged with Reciprocal Rank Fusion (score = Σ 1 / (60 + rank)). Chunks that rank well in multiple result sets float to the top. The pipeline degrades gracefully — if one index is unavailable it is skipped.

The query analyzer automatically selects the search strategy:

Mode	When it is used
`hybrid`	Mixed or general queries where semantic and keyword signals both help
`faiss`	Conceptual queries such as explanations, architecture, patterns, and design questions
`bm25`	Keyword-heavy queries such as API names, code symbols, methods, classes, errors, and short exact searches

Query embeddings and search results are cached with an in-memory LRU cache to reduce repeated embedding and retrieval work. If enabled, the optional cross-encoder reranker performs a second-stage pass over retrieved chunks before context is sent to the model.

2. Search Complexity Profiles

Tensor Serve supports configurable search complexity tiers allowing you to optimize for your specific deployment:

Profile	Search Algorithms	Use Case	Latency	Memory
Lightweight	BM25 Okapi + FAISS Flat	Local machines, embedded	<20ms	<500MB
Balanced (default)	BM25 Okapi + FAISS Flat + Reranking	General purpose servers	50-100ms	1-2GB
Production	BM25+ + FAISS-IVF + Query Expansion + Advanced Reranking	Enterprise servers, large scale	200-500ms	4-8GB
Manual	Custom backend selection	Fine-tuned deployments	Varies	Varies

→ Read the full Search Profiles Guide

Quick Start: Switching Profiles

Use a preset profile:

tensor-serve config set-search-profile lightweight
tensor-serve config set-search-profile production

Fine-tune with overrides:

tensor-serve config set-search-profile balanced \
  --query-expansion prf \
  --enable-reranker

Manual profile (full control):

tensor-serve config set-search-profile manual \
  --keyword-backend bm25_plus \
  --semantic-backend faiss_ivf \
  --query-expansion prf \
  --enable-reranker \
  --reranker-model balanced

REST endpoints remain available for automation and custom integrations, for example POST /config/search-profiles/production.

Available Backends

Keyword Search:

bm25_okapi - Standard BM25, fast baseline
bm25_plus - Enhanced BM25 with better precision

Semantic Search:

faiss_flat - Exact L2 distance search (good for <500K vectors)
faiss_ivf - Approximate search with clustering (optimal for 500K+ vectors)

Query Expansion (Optional)

Dynamically expands queries to improve recall:

none - No expansion (default, fastest)
prf - Pseudo-relevance feedback (expand with top-1 result terms)
entity - Entity extraction and weighting

Reranker Models

Fine-tune quality vs. latency trade-off:

lightweight - 22M params, ~50ms per batch (default)
balanced - 71M params, ~100ms per batch (recommended for production)

Detailed information about the RAG proxy implementation can be found here.

3. CLI Reference

CLI Reference can be found here. It covers ZIM downloads, configuration, health, cache, cleanup, ingestion, vector databases, and collections.

5. REST API (`api/main.py`)

API Reference can be found here. Contains Health & Configuration, Cache, Collections, ZIM File Management, Vector Database, Download progress fields, Cleanup, OpenAI-Compatible API, Settings, Web Search for Time-Sensitive Information, Search Mode Customization, Model auto-detection.

Using With OpenAI-compatible Tools (Code Editors, etc..)

Point any OpenAI-compatible tool at http://localhost:8000/v1 (or http://localhost:8000 for tools that auto-discover models):

Tool	Configuration
Zed	Settings: `assistant.openai_api_url` = `http://localhost:8000`
Cursor	Settings → Models → OpenAI Base URL = `http://localhost:8000`
Continue (VS Code)	`~/.continue/config.json` → `models` → `apiBase` = `http://localhost:8000/v1`
Aider	`--openai-api-base http://localhost:8000/v1`
Open WebUI	Admin → Connections → OpenAI API → Base URL = `http://localhost:8000/v1`
OpenAI SDKs	`client = OpenAI(base_url="http://localhost:8000/v1")`

Setup

Prerequisites

Python 3.10+ (check with python3 --version)
pip (Python package manager, usually bundled with Python)
An OpenAI-compatible AI endpoint (examples: Ollama, LM Studio, OpenAI API, Anthropic, LiteLLM gateway) — optional for basic setup, required for chat functionality

Setup Example

1. Install via pip:

pip install tensor-serve

2. Create and activate a virtual environment (optional but recommended):

python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install tensor-serve

3. Configure the upstream OpenAI-compatible AI endpoint:

tensor-serve config detect-local-ai
tensor-serve config set-ai-endpoint \
  --endpoint http://localhost:11434 \
  --model mistral

Optional: inspect models exposed by the configured endpoint:

tensor-serve config list-models

4. Choose where ZIM files are stored:

tensor-serve config set-zim-source ./zim_files

5. Browse and download ZIM content from Kiwix:

tensor-serve zim list
tensor-serve zim install wikivoyage_en_europe

Optional: use an interactive category downloader instead:

tensor-serve zim install-category coding

6. Review the saved configuration and installed ZIM files:

tensor-serve config show
tensor-serve zim status

7. Start the server:

tensor-serve start

Other start options:

tensor-serve start --port 3000              # Custom port
tensor-serve start --auto-port              # Auto-select available port if 8000 is in use
tensor-serve start --reload                 # Development mode with auto-reload

Note: If you prefer to install from source (development), clone the repository and install in editable mode:
git clone https://github.com/3M1RY33T/tensor-serve.git
cd tensor-serve
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip install -e .

For cloud or gateway providers, include an API key and provider-specific endpoint:

tensor-serve config set-ai-endpoint \
  --endpoint https://api.openai.com/v1 \
  --model gpt-4o-mini \
  --api-key "$OPENAI_API_KEY"

API keys are encrypted before they are written to config.json. Tensor Serve uses a local .tensor_config.key file by default, or you can provide TENSOR_CONFIG_KEY / TENSOR_CONFIG_KEY_FILE for deployments that manage secrets externally.

Docker

Build and run locally:

docker build -t tensor-serve:local .
docker run --rm -p 8000:8000 -v tensor_serve_data:/data tensor-serve:local

Or use Compose:

docker compose up --build

The container stores runtime state in /data, including config.json, encrypted config key material, ZIM files, collections, and generated vector databases. When connecting to a host machine AI runtime from Docker Desktop, use the host gateway address:

docker compose exec tensor-serve tensor-serve config set-ai-endpoint \
  --endpoint http://host.docker.internal:11434 \
  --model mistral

Supported Environments

Local AI Runtimes (no API key needed):

Ollama — easy single-command setup
LM Studio — GUI-based model management
vLLM — high-performance serving

Cloud APIs (API key required):

OpenAI (https://api.openai.com/v1)
Anthropic Claude
Other OpenAI-compatible endpoints

Gateways:

LiteLLM — unified interface for multiple providers

Workflow

Complete Example

Prerequisites:

Tensor Serve is installed (see Setup above)
An OpenAI-compatible AI endpoint is running locally or accessible via API (e.g., Ollama on http://localhost:11434)

Steps:

# 1. Start the server
tensor-serve start

# 2. Leave the server running. In another terminal, check health
tensor-serve health

# 3. Ingest all files from the configured ZIM source folder into a vector database
tensor-serve ingest --source-folder --output-name travel

# 4. Load the database into memory
tensor-serve db load travel

# 5. Optional: enable web search for time-sensitive queries with the configuration CLI
tensor-serve config enable-web-search --provider duckduckgo
tensor-serve config set-search-modes --keyword-mode auto --semantic-mode on

# 6. Start chatting through the OpenAI-compatible proxy
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistral",
    "tensor_show_resources": false,
    "messages": [
      {"role": "user", "content": "Who invented the telephone?"}
    ]
  }'

# 7. Time-sensitive query (if web search is enabled, Tensor Serve can search web + ZIM)
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistral",
    "messages": [
      {"role": "user", "content": "What is the latest news about AI?"}
    ]
  }'

Error Handling

400: Bad request (DB not loaded, AI not configured, invalid input)
404: Resource not found (database files missing)
500: Server error
502: AI endpoint unreachable or error

Performance Notes

Large ZIM files (>1GB) may take 10-30 minutes to ingest
Both FAISS (.index + .pkl) and BM25 (.bm25) indexes are saved to disk and reloaded on startup — no re-ingestion needed
FAISS similarity search is O(1); BM25 scoring is O(n) but extremely fast in practice
Hybrid RRF adds negligible overhead — both searches run in milliseconds (or up to 3 sources with web search)
Chat responses depend on AI endpoint response time
Existing databases ingested before hybrid search was added will use semantic-only search until re-ingested (no .bm25 file present → graceful fallback)
Web search (when enabled): adds 1-3 seconds per time-sensitive query; cached results are instant; disabled by default (zero overhead)

Contributing

Thanks for helping improve Tensor Serve.

Please refer to Contributing for more information.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

May 11, 2026

0.1.0

May 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tensor_serve-0.2.0.tar.gz (94.8 kB view details)

Uploaded May 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tensor_serve-0.2.0-py3-none-any.whl (87.0 kB view details)

Uploaded May 11, 2026 Python 3

File details

Details for the file tensor_serve-0.2.0.tar.gz.

File metadata

Download URL: tensor_serve-0.2.0.tar.gz
Upload date: May 11, 2026
Size: 94.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for tensor_serve-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`956c4c74ff2cb2428cb88f70273798ec7452cc6f5333ec6774f9cfb0567ac349`
MD5	`9d361834c1fb49cec928fa7b2867477b`
BLAKE2b-256	`5187a9c5d2cd21bf0a3bc05a57544fc2b1815d6da6718e4210aed8b68ec24b5f`

See more details on using hashes here.

File details

Details for the file tensor_serve-0.2.0-py3-none-any.whl.

File metadata

Download URL: tensor_serve-0.2.0-py3-none-any.whl
Upload date: May 11, 2026
Size: 87.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for tensor_serve-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8e810c13cecab1f49f43f756babed0178cc8aa0104237a226c8acffdcdb7b040`
MD5	`679d9c76ddf03c0e83e0444e539544a6`
BLAKE2b-256	`b62412bbcda2fed1f20674ce00c86a81edd90872dea3628f2e443f4f3f2da2a7`

See more details on using hashes here.

tensor-serve 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Tensor (Serve)

1. How the AI pipeline works

Hybrid search (FAISS + BM25 + optional Web Search w/ Reciprocal Rank Fusion)

2. Search Complexity Profiles

Quick Start: Switching Profiles

Available Backends

Query Expansion (Optional)

Reranker Models

3. CLI Reference

5. REST API (api/main.py)

Using With OpenAI-compatible Tools (Code Editors, etc..)

Setup

Prerequisites

Setup Example

Docker

Supported Environments

Workflow

Complete Example

Error Handling

Performance Notes

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

5. REST API (`api/main.py`)