Skip to main content

A dynamic zero-token semantic router

Project description

SynaptoRoute

PyPI version CI/CD Pipeline License: MIT Python 3.10+ FastAPI PRs Welcome

SynaptoRoute is a high-throughput, local semantic routing engine built for production Python microservices. Designed as a mathematically optimal alternative to Large Language Model (LLM) routing chains and slower local routers, it provides zero-token intent classification in under 3 milliseconds on standard cloud hardware.

Table of Contents


Why SynaptoRoute?

In modern agentic systems, relying on an external API (like OpenAI or Anthropic) to make simple routing decisions—such as determining if a user wants to reset their password or check their balance—introduces unacceptable latency (300ms+) and high token costs.

SynaptoRoute solves this by executing intent classification entirely locally using INT8 quantized vector embeddings.

SynaptoRoute was engineered specifically to solve the $O(N)$ memory degradation problem during live hot-reloading and to maximize hardware utilization via asynchronous dynamic batching.

Architecture & Optimizations

1. Lazy Memory Compilation

Traditional routers suffer from severe performance degradation during live updates. When a new route is added, they execute an immediate numpy.vstack, copying the entire vector array in memory ($O(N)$ complexity). SynaptoRoute defers this reallocation, appending new vectors to a lightweight list ($O(1)$) and only executing the heavy compilation precisely when the next query arrives, preventing server freezes.

2. Dynamic Asynchronous Batching

Hardware accelerators (GPUs, AVX512 CPUs) are optimized for large matrix multiplications. Sending single queries sequentially incurs massive transfer overhead. SynaptoRoute utilizes a background asyncio.Queue worker that traps parallel HTTP requests, waits 5 milliseconds, groups them into a batch, and processes them in a single hardware cycle.

3. INT8 Quantization

By default, SynaptoRoute leverages the BAAI/bge-small-en-v1.5 model quantized to 8-bit integers via the ONNX runtime, slashing memory bandwidth requirements by 4x and maximizing CPU cache utilization.


Performance Benchmarks

The following metrics were captured via automated GitHub Actions CI/CD running on a standard, unaccelerated ubuntu-latest 2-core cloud CPU.

Metric Cloud CPU Latency Context
Inference P99 3.94 ms Single sequential query latency.
Amortized P50 2.69 ms Per-query latency when processing 1,000 concurrent requests via dynamic batching.
Hot-Reload 5.04 ms Time required to dynamically inject a new utterance into memory without dropping active API requests.

📊 View Full Benchmarks: For detailed analysis including Memory Leak Endurance, GPU Scaling, Classification F1-Scores, and Input Poisoning Survival Metrics, see our official BENCHMARKS.md.


Installation & Deployment

Method 1: Docker REST API (Recommended)

SynaptoRoute ships with a fully asynchronous FastAPI wrapper, designed for immediate drop-in deployment as a scalable microservice.

# Build the Docker image
docker build -t synaptoroute .

# Run the container
docker run -p 8000:8000 synaptoroute

You can interface with the router immediately:

curl -X POST http://localhost:8000/route \
     -H "Content-Type: application/json" \
     -d '{"query": "I need help resetting my password"}'

Method 2: Standard Python Package

To embed SynaptoRoute natively into your existing Python pipelines, install directly from pip (or via git if testing the latest main branch):

pip install synaptoroute

Quick Start Guide

import asyncio
from synaptoroute.router import AdaptiveRouter
from synaptoroute.encoder import Encoder
from synaptoroute.storage import SQLiteStorage
from synaptoroute.models import Route

async def main():
    # 1. Initialize Components
    encoder = Encoder()
    storage = SQLiteStorage("data/memory.sqlite")
    router = AdaptiveRouter(encoder, storage)
    
    # 2. Define Routes
    billing_route = Route(
        name="billing", 
        utterances=["I need a refund", "Where is my receipt?", "Cancel my subscription"]
    )
    router.add_route(billing_route)
    
    # 3. Start the Background Batching Worker
    await router.start()
    
    # 4. Execute Async Queries
    result = await router.aquery("How do I get my money back?")
    print(f"Matched Intent: {result.name}") # Output: billing
    
    # 5. Graceful Shutdown
    await router.stop()

if __name__ == "__main__":
    asyncio.run(main())

System Limitations

Horizontal Scaling (Kubernetes Split-Brain)
SynaptoRoute relies on a highly optimized, local in-memory NumPy matrix to achieve its microsecond latency. As such, it is structurally bound to a single node. If deployed across multiple load-balanced Kubernetes pods, a hot-reload request hitting Pod A will update Pod A's local memory, but Pod B will remain unaware. Scaling horizontally requires implementing an external event bus (e.g., Redis Pub/Sub) to broadcast memory invalidation events across the cluster.


Community & Contributing

We welcome contributions of all sizes from the open-source community!

  • Contributing: Please read our Contributing Guidelines to learn how to set up your development environment, run the test suite, and submit Pull Requests.
  • Code of Conduct: We are committed to fostering a welcoming environment. Please review our Code of Conduct.
  • Issues: If you discover a bug or have a feature request, please open an issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synaptoroute-0.1.0.tar.gz (26.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

synaptoroute-0.1.0-py2.py3-none-any.whl (11.2 kB view details)

Uploaded Python 2Python 3

File details

Details for the file synaptoroute-0.1.0.tar.gz.

File metadata

  • Download URL: synaptoroute-0.1.0.tar.gz
  • Upload date:
  • Size: 26.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for synaptoroute-0.1.0.tar.gz
Algorithm Hash digest
SHA256 31ebb5901c9e6d9d0c87de50705cefc6d75351e03e79dbf9786efe9448ee30ab
MD5 842cb8ec12a00d63cff8d816b165f7d8
BLAKE2b-256 8cdf470fce0a5dbb8651ad1966a69521bd2eb66d4a6f21ba3fc50ad85823fb11

See more details on using hashes here.

File details

Details for the file synaptoroute-0.1.0-py2.py3-none-any.whl.

File metadata

  • Download URL: synaptoroute-0.1.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for synaptoroute-0.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 44ef568f66ab5a1664a3f74bb71028259b446d7c1624d52c0e8725d3e570634a
MD5 93b2bc1879aaf59682eb18e07b6ec602
BLAKE2b-256 04478a4c1e7cae1ae725f517d68d5bf822a1d4ac2086220d504b0eac71c948f1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page