Model-aware text chunking and answer re-ranking for LLM pipelines. Automatically adapts chunk size to tokenizer and context window, then consolidates and ranks answers across chunks.
Project description
ChunkRank
Model-aware text chunking and answer re-ranking for LLM pipelines
Used internally for long-document QA and evaluation pipelines handling 1,000+ PDFs.
ChunkRank is a lightweight Python library that automatically chunks text based on an LLM's tokenizer and context window, then consolidates and ranks answers across chunks.
🔗 PyPI: https://pypi.org/project/chunkrank/
Why ChunkRank?
When working with LLMs, long documents must be split into chunks, but:
- Every model has different tokenizers and context limits
- Chunk sizes are usually hard-coded and error-prone
- Answer quality drops when responses come from multiple chunks
- Existing RAG frameworks are heavy when you only need chunking + ranking
ChunkRank solves this gap.
Installation
pip install chunkrank
With semantic chunking + cross-encoder reranking:
pip install chunkrank[semantic]
With all optional backends:
pip install chunkrank[all]
For development:
poetry install --with dev
Quick Example
import chunkrank
text = open("document.txt").read()
question = "What is the main topic of this document?"
chunks = chunkrank.split(text, model="gpt-4o-mini")
answers = chunkrank.answer(question, chunks)
best = chunkrank.rank(answers)
print(best)
Core API
import chunkrank
# 1. Split text into model-aware chunks
chunks = chunkrank.split(text, model="gpt-4o-mini")
# 2. Answer the question across all chunks
# Default: local extractive (no API key required)
answers = chunkrank.answer(question, chunks)
# With OpenAI:
answers = chunkrank.answer(question, chunks, provider="openai", api_key="sk-...")
# With Anthropic:
answers = chunkrank.answer(question, chunks, provider="anthropic", api_key="sk-ant-...")
# 3. Rank and return the best answer
best_answer = chunkrank.rank(answers)
Pipeline API
from chunkrank import ChunkRankPipeline
# Local (no LLM required)
pipe = ChunkRankPipeline(model="gpt-4o-mini")
# With OpenAI
pipe = ChunkRankPipeline(model="gpt-4o-mini", provider="openai", api_key="sk-...")
# With Anthropic
pipe = ChunkRankPipeline(model="gpt-4o-mini", provider="anthropic", api_key="sk-ant-...")
# Process — returns best answer
answer = pipe.process(question="What is the main topic?", text=text)
# Stream — yields answers progressively as each chunk is processed
for partial in pipe.stream(question="What is the main topic?", text=text):
print(partial)
Async API
from chunkrank import AsyncChunkRankPipeline
pipe = AsyncChunkRankPipeline(model="gpt-4o-mini", provider="openai", api_key="sk-...")
# Parallel chunk answering via asyncio.gather
answer = await pipe.process(question, text)
# Async streaming
async for partial in pipe.stream(question, text):
print(partial)
Module-level async functions:
import chunkrank
chunks = await chunkrank.async_split(text, model="gpt-4o-mini")
answers = await chunkrank.async_answer(question, chunks) # parallel LLM calls
best = await chunkrank.async_rank(answers)
Ranking Methods
| Method | Description | Extra dep |
|---|---|---|
bm25 (default) |
BM25 lexical ranking | none |
tfidf |
TF-IDF cosine similarity | none |
embedding |
Dense vector similarity | [semantic] or openai-embed |
cross-encoder |
Semantic cross-encoder (most accurate) | [semantic] |
from chunkrank import Ranker
ranker = Ranker(method="cross-encoder")
ranked = ranker.rank(question, answers)
Chunking Strategies
# Token-budget sliding window (default)
chunks = chunkrank.split(text, model="gpt-4o-mini", strategy="tokens", overlap_tokens=64)
# Semantic — splits on embedding similarity drops between sentences
chunks = chunkrank.split(text, model="gpt-4o-mini", strategy="semantic", similarity_threshold=0.5)
Retrieve-then-Answer (top-K)
Rank chunks first, answer only the top-K — reduces LLM calls on large documents:
pipe = ChunkRankPipeline(model="gpt-4o-mini", retrieval_top_k=3)
answer = pipe.process(question, text)
Disk Cache
Avoid re-chunking the same document on repeated runs:
from chunkrank import ChunkCache, Chunker, ChunkerConfig
cache = ChunkCache(".chunkrank_cache")
chunks = cache.get(text, model="gpt-4o-mini")
if chunks is None:
chunks = Chunker(ChunkerConfig(model="gpt-4o-mini")).split(text)
cache.set(text, model="gpt-4o-mini", chunks=chunks)
Runtime Model Registration
Register new models without editing the registry JSON:
import chunkrank
chunkrank.register_model("my-custom-model", max_context=200_000)
Supported Models
54 models in the built-in registry, including:
| Provider | Models |
|---|---|
| OpenAI | gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo, o1, o3, o3-mini, o4-mini |
| Anthropic | claude-3-opus/sonnet/haiku, claude-3-5-sonnet/haiku, claude-sonnet-4-6, claude-opus-4-6 |
| gemini-1.5-pro, gemini-1.5-flash, gemini-2.0-flash, gemini-2.5-pro | |
| Meta | Llama-3.1/3.2/3.3, Llama-4-Scout (10M ctx), Llama-4-Maverick |
| Mistral | mistral-7b, mixtral-8x7b, mistral-large, codestral |
| Cohere | command-r, command-r-plus, command-r7b |
| DeepSeek | deepseek-v3, deepseek-r1 |
| Qwen | qwen2.5-72b-instruct, qwen2.5-coder-32b-instruct |
Unknown models fall back to 128k context with tiktoken (o200k_base).
How It Fits
| Tool | What it does |
|---|---|
| LangChain / LlamaIndex | Full RAG pipelines |
| Haystack | End-to-end retrieval frameworks |
| ChunkRank | Focused, model-aware chunking + answer ranking |
ChunkRank complements RAG frameworks — it doesn't replace them.
Requirements
- Python 3.10+
- numpy, scikit-learn, rank-bm25
License
Apache 2.0 — see LICENCE.
Community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chunkrank-1.1.3.tar.gz.
File metadata
- Download URL: chunkrank-1.1.3.tar.gz
- Upload date:
- Size: 18.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.14.3 Darwin/25.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7394439c99c3876b50b7bcf5110ec064ad1caa5087a5651eb6f3ff5c64d943c8
|
|
| MD5 |
4e0133e377ebc427b6a7dc2bb8e9e60e
|
|
| BLAKE2b-256 |
1c4a52950a84f367520f91f1040ee1029a1c8de4018ccc3b7a03d51b05eeb657
|
File details
Details for the file chunkrank-1.1.3-py3-none-any.whl.
File metadata
- Download URL: chunkrank-1.1.3-py3-none-any.whl
- Upload date:
- Size: 21.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.14.3 Darwin/25.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9756dea68ac5dfb0cd2e0c3867de215d25f5f275c6157f23bf75118ddf6ab095
|
|
| MD5 |
9c12098b41757d4e92113eb555ddb83f
|
|
| BLAKE2b-256 |
8a2e3c5cc29fc43a451fbba06b2c6cf3722efcfaf6c85b3371eb84cd5958a00d
|