Skip to main content

A dynamic zero-token semantic router

Project description

SynaptoRoute

PyPI version CI/CD Pipeline License: MIT Python 3.12+ FastAPI

SynaptoRoute is a high-throughput, local semantic routing engine built for production Python microservices. Designed as an efficient alternative to LLM routing chains, it executes intent classification entirely locally using ONNX-accelerated vector embeddings.

The v0.2.0 architecture transitions the engine from a sequential low-latency prototype into a highly concurrent, ACID-compliant async batching router capable of safely absorbing asynchronous FastAPI server loads.

Table of Contents


Why SynaptoRoute?

In modern agentic systems, relying on an external API (like OpenAI or Anthropic) to make simple routing decisions introduces unacceptable latency and token costs.

SynaptoRoute solves this by executing intent classification locally. It is engineered to solve two specific problems that plague naive semantic routers: $O(N)$ memory degradation during live updates and thread-blocking under asynchronous web server concurrency.

v0.2.0 Architecture & Optimizations

1. Amortized $O(1)$ Lazy Memory Slicing

Traditional routers suffer from severe performance degradation during live updates. When a new route is added, they execute an immediate numpy.vstack, copying the entire vector array in memory ($O(N)$ complexity).

SynaptoRoute v0.2.0 pre-allocates a static tensor buffer at initialization. When routes are added dynamically, the router slots the embedding directly into a reserved float32 memory slice via list assignment. This bounds memory growth strictly to $O(1)$ and prevents server RAM exhaustion, even when handling 50,000 dense vectors.

2. Dynamic Asynchronous Batching

Hardware accelerators (GPUs, AVX CPUs) are optimized for large matrix multiplications. Processing single web requests sequentially wastes hardware potential and blocks Python's asyncio event loop.

SynaptoRoute introduces a background _batch_worker queue. It traps parallel HTTP requests, waits for a configurable window (e.g., 5 milliseconds), groups them into a dense batch, and processes them in a single mathematical hardware cycle. This architecture safely doubles throughput under heavy concurrent load.

3. SQLite Thread-Local Pooling & BLOB Caching

Routing logic is only useful if it's durable. SynaptoRoute serializes vectors into a local SQLite database.

  • Concurrency: Thread-local connection pooling ensures 100% data integrity even when 2,000 overlapping web requests attempt to modify routes simultaneously.
  • Cold Booting: The v0.2.1 hotfix introduced binary float32 BLOB caching to the schema. Bypassing CPU re-encoding allows a 50,000-vector routing table to boot in 0.45 seconds, completely eliminating cold-start bottlenecks.

Head-to-Head Benchmark

SynaptoRoute is architecturally optimized for async concurrent deployments. We evaluated it against semantic-router (using their default FastEmbedEncoder) to measure architectural scaling.

Metric semantic-router SynaptoRoute (v0.2.0)
Hot-Reload Degradation (500 Routes) +6.46 ms +0.74 ms
Concurrent Async Load Index is not ready (Thread Blocked) Successfully Batched (38+ QPS)

The Architectural Difference: The +0.74ms vs +6.46ms hot-reload degradation is a direct consequence of $O(1)$ lazy memory slicing vs $O(N)$ index recompilation. Under asyncio concurrent load, semantic-router's sync-first design produced Index is not ready failures; SynaptoRoute's _batch_worker queue absorbed all requests without dropping a query.

📊 View Full Benchmarks: For detailed statistical analysis, including GPU physics scaling, P50 vs P99 tradeoffs, and our roadmap for fixing distributed system limitations, see our official BENCHMARKS.md.


Installation & Deployment

Method 1: Docker REST API (Recommended)

SynaptoRoute ships with a fully asynchronous FastAPI wrapper, designed for immediate drop-in deployment as a scalable microservice.

# Build the Docker image
docker build -t synaptoroute .

# Run the container
docker run -p 8000:8000 synaptoroute

Method 2: Standard Python Package

To embed SynaptoRoute natively into your existing Python pipelines:

pip install synaptoroute

Quick Start Guide

import asyncio
from synaptoroute.router import AdaptiveRouter
from synaptoroute.encoder import Encoder
from synaptoroute.storage import SQLiteStorage
from synaptoroute.models import Route

async def main():
    # 1. Initialize Components
    encoder = Encoder(providers=["CPUExecutionProvider"])
    storage = SQLiteStorage("data/memory.sqlite")
    router = AdaptiveRouter(encoder, storage)
    
    # 2. Define Routes
    billing_route = Route(
        name="billing", 
        utterances=["I need a refund", "Where is my receipt?", "Cancel my subscription"]
    )
    router.add_route(billing_route)
    
    # 3. Start the Background Batching Worker
    await router.start()
    
    # 4. Execute Async Queries
    result = await router.aquery("How do I get my money back?")
    print(f"Matched Intent: {result.name}") # Output: billing
    
    # 5. Graceful Shutdown
    await router.stop()

if __name__ == "__main__":
    asyncio.run(main())

System Limitations

Horizontal Scaling (Kubernetes Split-Brain)
SynaptoRoute relies on a highly optimized, local in-memory NumPy tensor to achieve its routing speed. As such, it is structurally bound to a single node. If deployed across multiple load-balanced Kubernetes pods, a hot-reload request hitting Pod A will update Pod A's local memory, but Pod B will remain unaware.

Evaluating distributed embedding synchronization (e.g., Redis Pub/Sub or shared-memory) to unblock horizontal scaling is a core research focus for v0.3.0.


Community & Contributing

We welcome contributions of all sizes from the open-source community!

  • Contributing: Please read our Contributing Guidelines to learn how to set up your development environment.
  • Issues: If you discover a bug or have a feature request, please open an issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synaptoroute-0.2.0.tar.gz (32.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

synaptoroute-0.2.0-py2.py3-none-any.whl (11.9 kB view details)

Uploaded Python 2Python 3

File details

Details for the file synaptoroute-0.2.0.tar.gz.

File metadata

  • Download URL: synaptoroute-0.2.0.tar.gz
  • Upload date:
  • Size: 32.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for synaptoroute-0.2.0.tar.gz
Algorithm Hash digest
SHA256 504f0a861096f3fafbdf3cb4cb96f8bf11a7470f8183f4ff7d438743e19408de
MD5 f1b1b301ca4b49215d396e88a4e510c4
BLAKE2b-256 e7989b62588b285e80bbd4077e25fd78a0034a35fb584667a60cdc82645538aa

See more details on using hashes here.

File details

Details for the file synaptoroute-0.2.0-py2.py3-none-any.whl.

File metadata

  • Download URL: synaptoroute-0.2.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for synaptoroute-0.2.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 a4f52957d35a57fdf7abf9cb4d5e26eacd3aca974d20ec68e729888528eeb783
MD5 776b84e3a9f64dbb5bf49a4bbab910c8
BLAKE2b-256 ee70e41ef929df3a3a73835970d6d4409f162b78670d5e7adf6d270f7788d954

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page