A dynamic zero-token semantic router
Project description
SynaptoRoute
SynaptoRoute is a high-throughput, local semantic routing engine built for production Python microservices. Designed as a mathematically optimal alternative to Large Language Model (LLM) routing chains and slower local routers, it provides zero-token intent classification in under 3 milliseconds on standard cloud hardware.
Table of Contents
- Why SynaptoRoute?
- Architecture & Optimizations
- Performance Benchmarks
- Installation & Deployment
- Quick Start Guide
- System Limitations
- Community & Contributing
Why SynaptoRoute?
In modern agentic systems, relying on an external API (like OpenAI or Anthropic) to make simple routing decisions—such as determining if a user wants to reset their password or check their balance—introduces unacceptable latency (300ms+) and high token costs.
SynaptoRoute solves this by executing intent classification entirely locally using INT8 quantized vector embeddings.
SynaptoRoute was engineered specifically to solve the $O(N)$ memory degradation problem during live hot-reloading and to maximize hardware utilization via asynchronous dynamic batching.
Architecture & Optimizations
1. Lazy Memory Compilation
Traditional routers suffer from severe performance degradation during live updates. When a new route is added, they execute an immediate numpy.vstack, copying the entire vector array in memory ($O(N)$ complexity). SynaptoRoute defers this reallocation, appending new vectors to a lightweight list ($O(1)$) and only executing the heavy compilation precisely when the next query arrives, preventing server freezes.
2. Dynamic Asynchronous Batching
Hardware accelerators (GPUs, AVX512 CPUs) are optimized for large matrix multiplications. Sending single queries sequentially incurs massive transfer overhead. SynaptoRoute utilizes a background asyncio.Queue worker that traps parallel HTTP requests, waits 5 milliseconds, groups them into a batch, and processes them in a single hardware cycle.
3. INT8 Quantization
By default, SynaptoRoute leverages the BAAI/bge-small-en-v1.5 model quantized to 8-bit integers via the ONNX runtime, slashing memory bandwidth requirements by 4x and maximizing CPU cache utilization.
Performance Benchmarks
The following metrics were captured via automated GitHub Actions CI/CD running on a standard, unaccelerated ubuntu-latest 2-core cloud CPU.
| Metric | Cloud CPU Latency | Context |
|---|---|---|
| Inference P99 | 3.94 ms | Single sequential query latency. |
| Amortized P50 | 2.69 ms | Per-query latency when processing 1,000 concurrent requests via dynamic batching. |
| Hot-Reload | 5.04 ms | Time required to dynamically inject a new utterance into memory without dropping active API requests. |
📊 View Full Benchmarks: For detailed analysis including Memory Leak Endurance, GPU Scaling, Classification F1-Scores, and Input Poisoning Survival Metrics, see our official BENCHMARKS.md.
Installation & Deployment
Method 1: Docker REST API (Recommended)
SynaptoRoute ships with a fully asynchronous FastAPI wrapper, designed for immediate drop-in deployment as a scalable microservice.
# Build the Docker image
docker build -t synaptoroute .
# Run the container
docker run -p 8000:8000 synaptoroute
You can interface with the router immediately:
curl -X POST http://localhost:8000/route \
-H "Content-Type: application/json" \
-d '{"query": "I need help resetting my password"}'
Method 2: Standard Python Package
To embed SynaptoRoute natively into your existing Python pipelines, install directly from pip (or via git if testing the latest main branch):
pip install synaptoroute
Quick Start Guide
import asyncio
from synaptoroute.router import AdaptiveRouter
from synaptoroute.encoder import Encoder
from synaptoroute.storage import SQLiteStorage
from synaptoroute.models import Route
async def main():
# 1. Initialize Components
encoder = Encoder()
storage = SQLiteStorage("data/memory.sqlite")
router = AdaptiveRouter(encoder, storage)
# 2. Define Routes
billing_route = Route(
name="billing",
utterances=["I need a refund", "Where is my receipt?", "Cancel my subscription"]
)
router.add_route(billing_route)
# 3. Start the Background Batching Worker
await router.start()
# 4. Execute Async Queries
result = await router.aquery("How do I get my money back?")
print(f"Matched Intent: {result.name}") # Output: billing
# 5. Graceful Shutdown
await router.stop()
if __name__ == "__main__":
asyncio.run(main())
System Limitations
Horizontal Scaling (Kubernetes Split-Brain)
SynaptoRoute relies on a highly optimized, local in-memory NumPy matrix to achieve its microsecond latency. As such, it is structurally bound to a single node. If deployed across multiple load-balanced Kubernetes pods, a hot-reload request hitting Pod A will update Pod A's local memory, but Pod B will remain unaware. Scaling horizontally requires implementing an external event bus (e.g., Redis Pub/Sub) to broadcast memory invalidation events across the cluster.
Community & Contributing
We welcome contributions of all sizes from the open-source community!
- Contributing: Please read our Contributing Guidelines to learn how to set up your development environment, run the test suite, and submit Pull Requests.
- Code of Conduct: We are committed to fostering a welcoming environment. Please review our Code of Conduct.
- Issues: If you discover a bug or have a feature request, please open an issue.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file synaptoroute-0.1.0.tar.gz.
File metadata
- Download URL: synaptoroute-0.1.0.tar.gz
- Upload date:
- Size: 26.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
31ebb5901c9e6d9d0c87de50705cefc6d75351e03e79dbf9786efe9448ee30ab
|
|
| MD5 |
842cb8ec12a00d63cff8d816b165f7d8
|
|
| BLAKE2b-256 |
8cdf470fce0a5dbb8651ad1966a69521bd2eb66d4a6f21ba3fc50ad85823fb11
|
File details
Details for the file synaptoroute-0.1.0-py2.py3-none-any.whl.
File metadata
- Download URL: synaptoroute-0.1.0-py2.py3-none-any.whl
- Upload date:
- Size: 11.2 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
44ef568f66ab5a1664a3f74bb71028259b446d7c1624d52c0e8725d3e570634a
|
|
| MD5 |
93b2bc1879aaf59682eb18e07b6ec602
|
|
| BLAKE2b-256 |
04478a4c1e7cae1ae725f517d68d5bf822a1d4ac2086220d504b0eac71c948f1
|