LangChain integration for HighSNR — compress documents to a token budget, keeping the highest-signal content.
Project description
langchain-highsnr
LangChain integration for HighSNR — compress documents to a token budget, keeping the highest-signal content.
pip install langchain-highsnr
Get an API key at console.high-snr.com. Requires Python 3.9+ and langchain-core>=0.3.0.
What it does
HighSNR selects the most informative chunks from a document and discards the rest, staying within a token budget. Compression is deterministic, privacy-first, and sub-second for most documents.
| Class | Position in pipeline | Use case |
|---|---|---|
HighSNRDocumentTransformer |
Before embedding | Compress raw docs before indexing |
HighSNRDocumentCompressor |
After retrieval | Compress retrieved chunks before LLM |
Usage
HighSNRDocumentTransformer — compress before indexing
from langchain_highsnr import HighSNRDocumentTransformer
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
transformer = HighSNRDocumentTransformer(
api_key="snr-...", # or set HIGHSNR_API_KEY env var
max_output_tokens=800, # token budget per document
context_hint="clinical trial methodology", # optional topic hint
)
compressed = transformer.transform_documents(raw_docs)
vectorstore = FAISS.from_documents(compressed, OpenAIEmbeddings())
| Parameter | Default | Description |
|---|---|---|
api_key |
None |
Falls back to HIGHSNR_API_KEY env var |
max_output_tokens |
1000 |
Token budget per document |
include_boundaries |
True |
Keep first and last chunk |
context_hint |
None |
Topic/query to bias chunk selection |
HighSNRDocumentCompressor — compress after retrieval
The user's query is automatically used as the selection hint.
from langchain_highsnr import HighSNRDocumentCompressor
from langchain.retrievers import ContextualCompressionRetriever
compressor = HighSNRDocumentCompressor(
api_key="snr-...",
max_output_tokens=2000,
)
retriever = ContextualCompressionRetriever(
base_compressor=compressor,
base_retriever=vectorstore.as_retriever(search_kwargs={"k": 20}),
)
docs = retriever.invoke("what is the main finding?")
| Parameter | Default | Description |
|---|---|---|
api_key |
None |
Falls back to HIGHSNR_API_KEY env var |
max_output_tokens |
2000 |
Token budget across all chunks |
include_boundaries |
False |
Keep first/last chunk |
group_by_source |
True |
Group chunks by metadata["source"] — one API call per source document (benchmark-validated) |
Benchmarks
Evaluated on LongBench v1 with GPT-4o (n=200 per dataset). At 80% budget with hint:
- HotpotQA: F1 70.96 — exceeds full-context GPT-4o (69.71)
- Qasper: F1 45.21 — 96% of full-context GPT-4o (47.22)
Full results, scripts, and reproduction instructions: github.com/HighSNRInc/highsnr-benchmarks
Environment variables
| Variable | Description |
|---|---|
HIGHSNR_API_KEY |
API key — alternative to passing api_key in the constructor |
HIGHSNR_API_URL |
Override the API base URL (default: https://api.high-snr.com) |
Links
- API console & free tier: console.high-snr.com
- Homepage: high-snr.com
- Benchmarks: github.com/HighSNRInc/highsnr-benchmarks
- Support: hello@high-snr.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_highsnr-0.1.1.tar.gz.
File metadata
- Download URL: langchain_highsnr-0.1.1.tar.gz
- Upload date:
- Size: 113.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
509e0d23b9b9dc39f096ce8b073d43a37ea6827b38a74ce123d74f77adbe3d1f
|
|
| MD5 |
c7620e506943b446d0245185a0ab7794
|
|
| BLAKE2b-256 |
d91987d59d366934a518e86a4913dae16049c5b836224e1805128b038875f69d
|
Provenance
The following attestation bundles were made for langchain_highsnr-0.1.1.tar.gz:
Publisher:
publish.yml on HighSNRInc/langchain-highsnr
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
langchain_highsnr-0.1.1.tar.gz -
Subject digest:
509e0d23b9b9dc39f096ce8b073d43a37ea6827b38a74ce123d74f77adbe3d1f - Sigstore transparency entry: 1115248894
- Sigstore integration time:
-
Permalink:
HighSNRInc/langchain-highsnr@3f2c9f4a1f392be895bb417a98e017058eb542d1 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/HighSNRInc
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3f2c9f4a1f392be895bb417a98e017058eb542d1 -
Trigger Event:
push
-
Statement type:
File details
Details for the file langchain_highsnr-0.1.1-py3-none-any.whl.
File metadata
- Download URL: langchain_highsnr-0.1.1-py3-none-any.whl
- Upload date:
- Size: 7.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a78d5fbf6e93ed8ad524243484f9fe8e3638933a1e4e94c45198f63aa402d10
|
|
| MD5 |
90eece119fe62c23b33eba5adddc2863
|
|
| BLAKE2b-256 |
351b11b37eb743f240d8afd0966154c567bf1fdd1ed746895a8478b891fc7310
|
Provenance
The following attestation bundles were made for langchain_highsnr-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on HighSNRInc/langchain-highsnr
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
langchain_highsnr-0.1.1-py3-none-any.whl -
Subject digest:
4a78d5fbf6e93ed8ad524243484f9fe8e3638933a1e4e94c45198f63aa402d10 - Sigstore transparency entry: 1115248949
- Sigstore integration time:
-
Permalink:
HighSNRInc/langchain-highsnr@3f2c9f4a1f392be895bb417a98e017058eb542d1 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/HighSNRInc
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3f2c9f4a1f392be895bb417a98e017058eb542d1 -
Trigger Event:
push
-
Statement type: