Open long-context inference stack: retrieval + open weights, no closed parts.

These details have not been verified by PyPI

Project links

Project description

longctx

Open long-context inference stack. Retrieval + open weights, no closed parts.

A small library that bundles the components needed to reach Anthropic-class long-context retrieval performance on a single accessible GPU using only open weights.

What it is

longctx is a thin wrapper over standard tools:

Retrieval: sentence-transformers (bi-encoder) + faiss
Generation: any OpenAI-compatible LLM endpoint (vLLM, SGLang, llama.cpp server)
Defaults tuned for: Qwen2.5-14B-Instruct-1M, but works with any instruction-following open model

Why

A stack of longctx defaults running Qwen2.5-14B-Instruct-1M on a single MI300X scored 0.822 on MRCR v2 8K bin (n=82, mass-validated 2026-05-06), beating the headline number a $29M-funded closed-weight startup published with their custom subquadratic architecture. The architectural moat narrative wasn't load-bearing for the workload. Retrieval + open weights solve it.

This library exists so the rest of the open ecosystem can reproduce that result with one pip install.

Install

pip install longctx

For local vLLM serving:

pip install longctx[serve]

Quickstart

from longctx import LongCtxClient

# Defaults: sentence-transformers/all-MiniLM-L6-v2 + local vLLM at port 5050
client = LongCtxClient()

# Pass your candidate chunks and a query
result = client.ask(
    query="What was the third response about regulatory compliance?",
    candidates=[
        "Response 1: brief on regulatory compliance...",
        "Response 2: legal analysis of...",
        "Response 3: detailed compliance walkthrough...",
        # ... up to thousands of candidates
    ],
    top_k=8,
)

print(result.content)
print(f"Retrieved indices: {result.retrieved_indices}")
print(f"Prompt tokens: {result.prompt_tokens}")

Custom embedder

from longctx import LongCtxClient, RetrievalPipeline

# Default uses MiniLM-L6 (23M params, CPU-friendly).
# For higher quality at the cost of compute:
pipeline = RetrievalPipeline(embedder_model="BAAI/bge-large-en-v1.5")
client = LongCtxClient(pipeline=pipeline)

Notes on rerankers

longctx does not enable cross-encoder reranking by default. Off-the-shelf rerankers (ms-marco-MiniLM, bge-reranker-base) degraded retrieval quality on MRCR-style tasks in our 2026-05-06 testing. They are trained for web-search relevance, which doesn't transfer to "find the Nth message of type X" task semantics.

A retrieval-style reranker fine-tuned on appropriate data is on the roadmap. Until then, pure bi-encoder retrieval is the default.

Status

Pre-alpha v0.1.0. APIs may change.

Headline numbers (mass-validated)

End-to-end validation 2026-05-06 on AMD MI300X with vLLM-served Qwen2.5-14B-Instruct-1M, default LongCtxClient config (sentence-transformers MiniLM-L6 + faiss top-K=8):

MRCR v2 8-needle bin	pipeline	n	avg_score	prefix_pass
8K (16K-32K char)	RAG	82	0.822	100%
32K (64K-128K char)	RAG	98	0.697	97%
64K (128K-256K char)	RAG	95	0.641	98%
64K (128K-256K char)	chunked-RAG	95	0.670	98%

Reference baseline: SubQ Inc.'s published MRCR headline = 0.659 (closed-weight, custom subquadratic architecture, $29M funding).

Three of three bins clear the closed-weight headline with the right pipeline. Plain RAG over standard attention is competitive with claimed-state-of-the-art subquadratic architectures on MRCR-style retrieval workloads at every bin we measured.

Other tested generators (single-run, n=30, not mass-validated)

Qwen2.5-7B-Instruct + RAG: 0.567 (2.4× faster, fits 16GB GPU)
Qwen2.5-32B-Instruct + RAG: 0.237 (vanilla 32K context window, training-data fit limits the result)
Qwen3-Next-80B-A3B + RAG: 0.281 (linear-attention hybrid, MoE)

Single-run scores at n=30 have substantial variance (we observed ±0.05 swings between adjacent runs of the same config). Trust the mass-validated numbers above for headline claims.

Mistral-7B-Instruct-v0.3 and Qwen3-8B failed with the default Qwen2.5-style template (prefix-first instruction). Templates are provided for both: longctx.templates.MISTRAL_VERBATIM_TEMPLATE and longctx.templates.QWEN3_NO_THINK_TEMPLATE. Validation against MRCR for these templates is on the roadmap.

Reproduce

longctx-bench --data-dir /path/to/mrcr/v2 --model qwen2.5-14b-instruct-1m \
    --bins 8k 32k 64k --n 80 --include-chunked

License

Apache 2.0.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

May 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

longctx-0.2.0.tar.gz (30.5 kB view details)

Uploaded May 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

longctx-0.2.0-py3-none-any.whl (24.9 kB view details)

Uploaded May 6, 2026 Python 3

File details

Details for the file longctx-0.2.0.tar.gz.

File metadata

Download URL: longctx-0.2.0.tar.gz
Upload date: May 6, 2026
Size: 30.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for longctx-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`8222af746975cb06d0df0ac24f94ef0b17d36c8cde5497aabb5fcf71058b34ae`
MD5	`493e255dbf625507e11a81ca2d145e60`
BLAKE2b-256	`a0171cd18ea7a38710eecf374f057139a96a65af8048da9fcd5109c8ec4f8341`

See more details on using hashes here.

File details

Details for the file longctx-0.2.0-py3-none-any.whl.

File metadata

Download URL: longctx-0.2.0-py3-none-any.whl
Upload date: May 6, 2026
Size: 24.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for longctx-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0b75203acbe220677ecdbfa68bd6d3bfeee62573e3f85302c2bbd47f0d8177a6`
MD5	`7a8d068f427397382f20e49ede27a235`
BLAKE2b-256	`87fbc1647942a95a60e982421235dc2962dfc386c7616b030e61468872be3363`

See more details on using hashes here.

longctx 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

longctx

What it is

Why

Install

Quickstart

Custom embedder

Notes on rerankers

Status

Headline numbers (mass-validated)

Other tested generators (single-run, n=30, not mass-validated)

Reproduce

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes