AI orchestration engine — semantic retrieval, context compression, and intelligent model routing
Project description
TokenSense
AI orchestration engine — Reduce LLM token usage by up to 72% with semantic retrieval, context compression, and intelligent model routing.
TokenSense sits between you and any LLM backend, transparently optimizing every request. Send only relevant context, pay less, get better answers.
Features
- Semantic Retrieval — Vector search powered by Actian VectorAI DB
- Context Compression — Deduplicates and trims context to fit your token budget
- Intelligent Routing — Auto-selects the best model based on task complexity
- Multi-Backend — Works with OpenRouter, Gemini, or any LLM API
- Full Telemetry — Tracks tokens, cost, and latency for every query
- Three Interfaces — CLI, Web UI, and REST API
Installation
Option A — Install CLI from PyPI (recommended)
pip install tokensense
Option B — Install from source
git clone https://github.com/yourusername/TokenSense.git
cd TokenSense
pip install -e .
Quick Start
1. Start the backend and vector database
TokenSense requires a FastAPI backend and Actian VectorAI DB. Clone the repo and run:
# Start Actian VectorAI DB (Docker)
docker run -d -p 50051:50051 actian/vectorai-db
# Set environment variables
cp .env.example .env
# Edit .env with your API keys
# Start the backend
cd backend
pip install -r requirements.txt
uvicorn main:app --reload --port 8000
2. Configure the CLI
tokensense init
# API URL: http://localhost:8000
# API key: <your-tokensense-api-key>
3. Index your codebase
tokensense index ./my-project
4. Ask questions
tokensense ask "how does the authentication flow work?"
Output:
┌─────────────────── Answer ───────────────────┐
│ The authentication flow uses verify_api_key │
│ middleware that checks the X-API-Key header… │
└───────────────────────────────────────────────┘
┌──────────────┬──────────────┐
│ Model │ claude-haiku │
│ Input tokens │ 2,100 │
│ Reduction │ 74% │
│ Cost │ $0.001200 │
└──────────────┴──────────────┘
5. View your savings
tokensense stats
CLI Commands
| Command | Description |
|---|---|
tokensense init |
Configure API URL and key |
tokensense index <path> |
Index a directory into the vector DB |
tokensense ask "<query>" |
Send a query through the optimization pipeline |
tokensense stats |
View usage analytics and cost savings |
API Endpoints
Once the backend is running on http://localhost:8000:
POST /index
curl -X POST http://localhost:8000/index \
-H "X-API-Key: your-key" \
-d '{"path": "./my-app", "file_extensions": [".py", ".ts"]}'
POST /ask
curl -X POST http://localhost:8000/ask \
-H "X-API-Key: your-key" \
-d '{"query": "explain the auth flow", "token_budget": 8000}'
POST /optimize
Context optimization only (no LLM call):
curl -X POST http://localhost:8000/optimize \
-H "X-API-Key: your-key" \
-d '{"query": "describe the routing agent", "token_budget": 8000}'
GET /stats
curl http://localhost:8000/stats?limit=20 \
-H "X-API-Key: your-key"
Architecture
User Input (CLI / Web)
│
├─> Query Agent (generates embeddings, classifies task)
├─> Retrieval Agent (fetches relevant chunks from Actian VectorAI)
├─> Context Optimizer (deduplicates, compresses, fits token budget)
├─> Routing Agent (selects best model based on complexity)
├─> LLM Call (OpenRouter or Gemini)
└─> Telemetry Agent (logs tokens, cost, latency to SQLite)
Tech Stack
| Layer | Technology |
|---|---|
| CLI | Typer + httpx + rich |
| Backend | FastAPI + Python 3.11+ |
| Vector DB | Actian VectorAI DB (Docker) |
| Model Routing | OpenRouter API |
| Fallback LLM | Gemini API |
| Frontend | Next.js 14 + React 18 + Tailwind CSS (planned) |
Environment Variables
Create a .env file in the backend/ directory:
TOKENSENSE_API_KEY=your-secret-api-key
OPENROUTER_API_KEY=sk-or-...
GEMINI_API_KEY=AIza...
ACTIAN_HOST=localhost
ACTIAN_PORT=50051
Development
Run tests
# Backend + Actian integration tests
cd tests
python test_actian_via_api.py
python test_actian_direct.py
Build the package locally
pip install build
python -m build
pip install dist/tokensense-0.1.0-py3-none-any.whl
License
MIT
Contributing
See CLAUDE.md for the full architecture and build plan.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tokensense-0.1.1.tar.gz.
File metadata
- Download URL: tokensense-0.1.1.tar.gz
- Upload date:
- Size: 95.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70286e218293c1d4dd53ea5f3cfbc87ced54411fbb5c5ef899e16b5541b536af
|
|
| MD5 |
08e75e6e0a1f571c71b50a8208e57fd2
|
|
| BLAKE2b-256 |
b21c184445d235990cf9b8f28d12bae10d8bdd63e6ff625042dec1ee9237258f
|
File details
Details for the file tokensense-0.1.1-py3-none-any.whl.
File metadata
- Download URL: tokensense-0.1.1-py3-none-any.whl
- Upload date:
- Size: 7.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d5c90801c3623fd50e4f724a1a04660726283adf96b810351a80575c4cf2223
|
|
| MD5 |
dc0658e21e44604c683e721f27c81bb9
|
|
| BLAKE2b-256 |
30a8a08d1be947d9a264fbccb96ca32f3f4597ebd015d284e6654f49f65e03db
|