Skip to main content

Drop in FastAPI middleware/reverse proxy with semantic caching for APIs & LLMs

Project description

fastapi-semcache

Semantic caching middleware and reverse proxy for APIs and LLMs, with embeddings, pgvector similarity search, and Redis-backed response caching.

The PyPI distribution and GitHub repository are fastapi-semcache (the import package remains semanticcache).

Why fastapi-semcache?

This package is designed for direct integration into modern Python API stacks with minimal refactoring needed. It keeps the caching path simple and gives you explicit control over embeddings, vector search, and cache behavior.

It includes FastAPI middleware as a first-class integration path and can also run as a reverse proxy in front of an upstream API or LLM service. Django and Flask middleware are planned for a future release so you can hook semantic caching into those stacks the same way as FastAPI.

What is implemented

  • Huggingface embeddings via Sentence Transformers (embedder_type="huggingface").

  • OpenAI embeddings via the official async client (embedder_type="openai"; install embed-openai and set OPENAI_API_KEY). Use OpenAIEmbedder(..., send_dimensions_to_api=False) when the model has a fixed output size and the API must not get a dimensions field.

  • PostgreSQL + pgvector for semantic similarity lookup. The library creates a dedicated cache table per embedder configuration (derived from model id and vector dimension) on first use, so you are not tied to a single hard-coded vector width.

  • Redis for response caching (keys include an embedder-specific prefix so separate models do not collide).

  • FastAPI middleware for in-app semantic caching.

  • Reverse proxy mode via create_semantic_cache_proxy_app().

Future support

  • Django and Flask middleware for in-app semantic caching (not yet shipped. same role as the FastAPI middleware).

Embeddings from the following providers are planned:

  • Ollama (HTTP embedding API against a configurable base URL, so the server can run locally or on another host).
  • Cohere
  • Voyage

Quick start

from semanticcache import SemanticCache, create_semantic_cache_proxy_app

cache = SemanticCache()
app = create_semantic_cache_proxy_app(
    upstream="http://127.0.0.1:11434",
    cache=cache,
)

Run with:

uvicorn mymodule:app --host 0.0.0.0 --port 8080

Reverse proxy

Point clients at the proxy and configure Postgres, Redis, and the upstream base URL.

This repository includes a small ASGI app at app/main.py (import app for uvicorn). Set SEMANTIC_CACHE_PROXY_UPSTREAM to the backend base URL; the default is http://127.0.0.1:11434.

uv run uvicorn app.main:app --host 0.0.0.0 --port 8080

See create_semantic_cache_proxy_app in semanticcache.proxy for timeout, TLS verification, httpx_client_kwargs, and middleware options such as path_prefix and extract_query.

Install

pip install fastapi-semcache

Custom embedders: subclass BaseEmbedder from semanticcache.embedders and pass it to SemanticCache(embedder=...) to skip the optional embedding extras. See docs/embedders.md.

Optional extras:

  • embed-huggingface / embed-huggingface-cpu: Sentence Transformers with CPU PyTorch.
  • embed-huggingface-gpu: Sentence Transformers with a CUDA-enabled PyTorch install.
  • embed-openai: OpenAI embeddings (openai, tiktoken).

CPU

pip install "fastapi-semcache[embed-huggingface-cpu]"
# or: pip install "fastapi-semcache[embed-huggingface]"

GPU

Pick a CUDA version that matches your system from PyTorch Get Started, then install with that index so pip selects CUDA wheels.

pip install "fastapi-semcache[embed-huggingface-gpu]" \
  --extra-index-url https://download.pytorch.org/whl/cu124

OpenAI embeddings

Install the OpenAI extra so embedder_type="openai" works (pulls openai and tiktoken). Set OPENAI_API_KEY in your environment.

pip install "fastapi-semcache[embed-openai]"

Requirements

Python 3.12+.

Links

License

Apache-2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastapi_semcache-0.2.7.tar.gz (27.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastapi_semcache-0.2.7-py3-none-any.whl (34.4 kB view details)

Uploaded Python 3

File details

Details for the file fastapi_semcache-0.2.7.tar.gz.

File metadata

  • Download URL: fastapi_semcache-0.2.7.tar.gz
  • Upload date:
  • Size: 27.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux","version":"42","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fastapi_semcache-0.2.7.tar.gz
Algorithm Hash digest
SHA256 d9dd65509467512baee5339956457dbde1fc1da247253e00313f2c054e8ffd65
MD5 c404c327111cc469ca55beb6e997f990
BLAKE2b-256 8e4344b9e0854897a54e7da717ce29e0da523a857d06ba9c7c586e656798dbbf

See more details on using hashes here.

File details

Details for the file fastapi_semcache-0.2.7-py3-none-any.whl.

File metadata

  • Download URL: fastapi_semcache-0.2.7-py3-none-any.whl
  • Upload date:
  • Size: 34.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux","version":"42","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fastapi_semcache-0.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 3d83994fd9e302395d190bd5fec836e7527cacf1d319da706ec92eae7f535680
MD5 4de6b3944989089a75e301e109ada36e
BLAKE2b-256 cd1462fe69a116db4d9f0748234e2d757736348e06cfe773ad35771e45eb06ea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page