Skip to main content

Open long-context inference stack: retrieval + open weights, no closed parts.

Project description

longctx

Open long-context inference stack. Retrieval + open weights, no closed parts.

A small library that bundles the components needed to reach Anthropic-class long-context retrieval performance on a single accessible GPU using only open weights.

What it is

longctx is a thin wrapper over standard tools:

  • Retrieval: sentence-transformers (bi-encoder) + faiss
  • Generation: any OpenAI-compatible LLM endpoint (vLLM, SGLang, llama.cpp server)
  • Defaults tuned for: Qwen2.5-14B-Instruct-1M, but works with any instruction-following open model

Why

A stack of longctx defaults running Qwen2.5-14B-Instruct-1M on a single MI300X scored 0.822 on MRCR v2 8K bin (n=82, mass-validated 2026-05-06), beating the headline number a $29M-funded closed-weight startup published with their custom subquadratic architecture. The architectural moat narrative wasn't load-bearing for the workload. Retrieval + open weights solve it.

This library exists so the rest of the open ecosystem can reproduce that result with one pip install.

Install

pip install longctx

For local vLLM serving:

pip install longctx[serve]

Quickstart

from longctx import LongCtxClient

# Defaults: sentence-transformers/all-MiniLM-L6-v2 + local vLLM at port 5050
client = LongCtxClient()

# Pass your candidate chunks and a query
result = client.ask(
    query="What was the third response about regulatory compliance?",
    candidates=[
        "Response 1: brief on regulatory compliance...",
        "Response 2: legal analysis of...",
        "Response 3: detailed compliance walkthrough...",
        # ... up to thousands of candidates
    ],
    top_k=8,
)

print(result.content)
print(f"Retrieved indices: {result.retrieved_indices}")
print(f"Prompt tokens: {result.prompt_tokens}")

Custom embedder

from longctx import LongCtxClient, RetrievalPipeline

# Default uses MiniLM-L6 (23M params, CPU-friendly).
# For higher quality at the cost of compute:
pipeline = RetrievalPipeline(embedder_model="BAAI/bge-large-en-v1.5")
client = LongCtxClient(pipeline=pipeline)

Notes on rerankers

longctx does not enable cross-encoder reranking by default. Off-the-shelf rerankers (ms-marco-MiniLM, bge-reranker-base) degraded retrieval quality on MRCR-style tasks in our 2026-05-06 testing. They are trained for web-search relevance, which doesn't transfer to "find the Nth message of type X" task semantics.

A retrieval-style reranker fine-tuned on appropriate data is on the roadmap. Until then, pure bi-encoder retrieval is the default.

Status

Pre-alpha v0.1.0. APIs may change.

Headline numbers (mass-validated)

End-to-end validation 2026-05-06 on AMD MI300X with vLLM-served Qwen2.5-14B-Instruct-1M, default LongCtxClient config (sentence-transformers MiniLM-L6 + faiss top-K=8):

MRCR v2 8-needle bin pipeline n avg_score prefix_pass
8K (16K-32K char) RAG 82 0.822 100%
32K (64K-128K char) RAG 98 0.697 97%
64K (128K-256K char) RAG 95 0.641 98%
64K (128K-256K char) chunked-RAG 95 0.670 98%

Reference baseline: SubQ Inc.'s published MRCR headline = 0.659 (closed-weight, custom subquadratic architecture, $29M funding).

Three of three bins clear the closed-weight headline with the right pipeline. Plain RAG over standard attention is competitive with claimed-state-of-the-art subquadratic architectures on MRCR-style retrieval workloads at every bin we measured.

Other tested generators (single-run, n=30, not mass-validated)

  • Qwen2.5-7B-Instruct + RAG: 0.567 (2.4× faster, fits 16GB GPU)
  • Qwen2.5-32B-Instruct + RAG: 0.237 (vanilla 32K context window, training-data fit limits the result)
  • Qwen3-Next-80B-A3B + RAG: 0.281 (linear-attention hybrid, MoE)

Single-run scores at n=30 have substantial variance (we observed ±0.05 swings between adjacent runs of the same config). Trust the mass-validated numbers above for headline claims.

Mistral-7B-Instruct-v0.3 and Qwen3-8B failed with the default Qwen2.5-style template (prefix-first instruction). Templates are provided for both: longctx.templates.MISTRAL_VERBATIM_TEMPLATE and longctx.templates.QWEN3_NO_THINK_TEMPLATE. Validation against MRCR for these templates is on the roadmap.

Reproduce

longctx-bench --data-dir /path/to/mrcr/v2 --model qwen2.5-14b-instruct-1m \
    --bins 8k 32k 64k --n 80 --include-chunked

License

Apache 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

longctx-0.2.0.tar.gz (30.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

longctx-0.2.0-py3-none-any.whl (24.9 kB view details)

Uploaded Python 3

File details

Details for the file longctx-0.2.0.tar.gz.

File metadata

  • Download URL: longctx-0.2.0.tar.gz
  • Upload date:
  • Size: 30.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for longctx-0.2.0.tar.gz
Algorithm Hash digest
SHA256 8222af746975cb06d0df0ac24f94ef0b17d36c8cde5497aabb5fcf71058b34ae
MD5 493e255dbf625507e11a81ca2d145e60
BLAKE2b-256 a0171cd18ea7a38710eecf374f057139a96a65af8048da9fcd5109c8ec4f8341

See more details on using hashes here.

File details

Details for the file longctx-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: longctx-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 24.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for longctx-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0b75203acbe220677ecdbfa68bd6d3bfeee62573e3f85302c2bbd47f0d8177a6
MD5 7a8d068f427397382f20e49ede27a235
BLAKE2b-256 87fbc1647942a95a60e982421235dc2962dfc386c7616b030e61468872be3363

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page