Local retrieval companion for inference servers — scoped, session-aware, file-watching.

These details have not been verified by PyPI

Project links

Project description

longctx-svc

Local retrieval companion for inference servers. Scoped, session-aware, file-watching. Tool is optional — if you don't run it, your engine behaves exactly as before.

WIP. Apache-2.0.

Engine-agnostic by design

longctx-svc speaks plain HTTP/JSON. It works with any engine that accepts a prompt — no engine forks required for the basic path.

Engine	Mode	Wiring
`vllm-swift`	embedded	optional `--retrieval-endpoint URL` flag (engine-side)
`TheTom/llama-cpp-turboquant` (`llama-server`)	proxy	point client at longctx-svc; longctx-svc forwards to llama-server
`TheTom/vllm` (`feature/turboquant-amd-noautotune`)	proxy or embedded	OpenAI-compat passthrough; or call `LongctxClient` from a custom hook
vLLM (CUDA)	proxy	OpenAI-compat passthrough
anything OpenAI-compat	proxy	OpenAI-compat passthrough

Mode A — proxy (zero engine changes)

# 1. Run your engine as usual
llama-server -m model.gguf --port 8080 &
# (or vLLM AMD, vLLM CUDA, vllm-swift, ...)

# 2. Run longctx-svc in front of it
longctx-svc serve --upstream http://localhost:8080

# 3. Point your OpenAI client at longctx-svc instead of the engine
export OPENAI_BASE_URL=http://localhost:8765/v1

longctx-svc detects the project from the messages, retrieves top-K chunks, splices them into the system message, and forwards the request to the upstream. Response (including SSE stream) is passed straight back. If no path is mentioned in the messages, the request is forwarded unmodified.

Mode B — embedded (engine calls /retrieve)

For tighter integration (e.g. so the engine can reuse retrieved chunks across KV cache boundaries), engines import LongctxClient:

from longctx_svc.client import LongctxClient

cli = LongctxClient.from_env()        # honors LONGCTX_ENDPOINT
if cli is not None:                   # tool is optional
    res = cli.retrieve(
        prefill_text=full_prompt,
        query=user_message,
        session_id=session_id,
        top_k=8,
    )
    full_prompt = cli.splice(full_prompt, res)

Network failure → empty result → engine falls back to the no-retrieval path. Optional tool stays optional.

HTTP surface

Endpoint	Purpose
`POST /retrieve`	engine-side retrieval (Mode B)
`POST /v1/chat/completions`	OpenAI-compat passthrough (Mode A)
`POST /v1/completions`	legacy OpenAI-compat passthrough (Mode A)
`GET /longctx/status`	JSON status; `Accept: text/plain` for the Sarah-visible block
`GET /healthz`	liveness probe

Headers

Every retrieve / proxy response sets:

x-longctx-session: <session-id|ephemeral>
x-longctx-scope: <project-root|"">
x-longctx-chunks-used: <n>
x-longctx-scope-status: ready|empty|error|no-scope

Session affinity is sent on the request side via:

x-session-affinity: <id> (preferred)
x-session-id: <id>
metadata.session_id in the JSON body

No header → ephemeral request, no caching.

Install (alpha)

pip install -e services/longctx-svc
longctx-svc serve              # http://127.0.0.1:8765

Tests

cd services/longctx-svc
pytest tests/ --no-cov

85 tests cover: scope detection, walk + .gitignore, chunker, indexer, session manager, the Sarah-journey end-to-end, and the engine-agnostic client + OpenAI-compat proxy.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.0a3 pre-release

May 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

longctx_svc-0.3.0a3-py3-none-any.whl (45.6 kB view details)

Uploaded May 7, 2026 Python 3

File details

Details for the file longctx_svc-0.3.0a3-py3-none-any.whl.

File metadata

Download URL: longctx_svc-0.3.0a3-py3-none-any.whl
Upload date: May 7, 2026
Size: 45.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for longctx_svc-0.3.0a3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`354e4146b5b48f10e17a5c41c73d33feda8d64d4049fc661e5864d274d79acd3`
MD5	`66111665bf91a3e82aad0b54aa69601d`
BLAKE2b-256	`fc9233b8862a2d32eec6f5d567af92e4db0dd5ca6be6757d8efe750c2709c4ce`

See more details on using hashes here.

longctx-svc 0.3.0a3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

longctx-svc

Engine-agnostic by design

Mode A — proxy (zero engine changes)

Mode B — embedded (engine calls /retrieve)

HTTP surface

Headers

Install (alpha)

Tests

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes