Skip to main content

Model-aware text chunking and answer re-ranking for LLM pipelines. Automatically adapts chunk size to tokenizer and context window, then consolidates and ranks answers across chunks.

Project description

ChunkRank

Model-aware text chunking and answer re-ranking for LLM pipelines

PyPI version PyPI downloads Python versions License Tests


Used internally for long-document QA and evaluation pipelines handling 1,000+ PDFs.

ChunkRank is a lightweight Python library that automatically chunks text based on an LLM's tokenizer and context window, then consolidates and ranks answers across chunks.

🔗 PyPI: https://pypi.org/project/chunkrank/


Why ChunkRank?

When working with LLMs, long documents must be split into chunks, but:

  • Every model has different tokenizers and context limits
  • Chunk sizes are usually hard-coded and error-prone
  • Answer quality drops when responses come from multiple chunks
  • Existing RAG frameworks are heavy when you only need chunking + ranking

ChunkRank solves this gap.


Installation

pip install chunkrank

With semantic chunking + cross-encoder reranking:

pip install chunkrank[semantic]

With all optional backends:

pip install chunkrank[all]

For development:

poetry install --with dev

Quick Example

import chunkrank

text = open("document.txt").read()
question = "What is the main topic of this document?"

chunks = chunkrank.split(text, model="gpt-4o-mini")
answers = chunkrank.answer(question, chunks)
best = chunkrank.rank(answers)

print(best)

Core API

import chunkrank

# 1. Split text into model-aware chunks
chunks = chunkrank.split(text, model="gpt-4o-mini")

# 2. Answer the question across all chunks
#    Default: local extractive (no API key required)
answers = chunkrank.answer(question, chunks)

#    With OpenAI:
answers = chunkrank.answer(question, chunks, provider="openai", api_key="sk-...")

#    With Anthropic:
answers = chunkrank.answer(question, chunks, provider="anthropic", api_key="sk-ant-...")

# 3. Rank and return the best answer
best_answer = chunkrank.rank(answers)

Pipeline API

from chunkrank import ChunkRankPipeline

# Local (no LLM required)
pipe = ChunkRankPipeline(model="gpt-4o-mini")

# With OpenAI
pipe = ChunkRankPipeline(model="gpt-4o-mini", provider="openai", api_key="sk-...")

# With Anthropic
pipe = ChunkRankPipeline(model="gpt-4o-mini", provider="anthropic", api_key="sk-ant-...")

# Process — returns best answer
answer = pipe.process(question="What is the main topic?", text=text)

# Stream — yields answers progressively as each chunk is processed
for partial in pipe.stream(question="What is the main topic?", text=text):
    print(partial)

Async API

from chunkrank import AsyncChunkRankPipeline

pipe = AsyncChunkRankPipeline(model="gpt-4o-mini", provider="openai", api_key="sk-...")

# Parallel chunk answering via asyncio.gather
answer = await pipe.process(question, text)

# Async streaming
async for partial in pipe.stream(question, text):
    print(partial)

Module-level async functions:

import chunkrank

chunks = await chunkrank.async_split(text, model="gpt-4o-mini")
answers = await chunkrank.async_answer(question, chunks)   # parallel LLM calls
best = await chunkrank.async_rank(answers)

Ranking Methods

Method Description Extra dep
bm25 (default) BM25 lexical ranking none
tfidf TF-IDF cosine similarity none
embedding Dense vector similarity [semantic] or openai-embed
cross-encoder Semantic cross-encoder (most accurate) [semantic]
from chunkrank import Ranker

ranker = Ranker(method="cross-encoder")
ranked = ranker.rank(question, answers)

Chunking Strategies

# Token-budget sliding window (default)
chunks = chunkrank.split(text, model="gpt-4o-mini", strategy="tokens", overlap_tokens=64)

# Semantic — splits on embedding similarity drops between sentences
chunks = chunkrank.split(text, model="gpt-4o-mini", strategy="semantic", similarity_threshold=0.5)

Retrieve-then-Answer (top-K)

Rank chunks first, answer only the top-K — reduces LLM calls on large documents:

pipe = ChunkRankPipeline(model="gpt-4o-mini", retrieval_top_k=3)
answer = pipe.process(question, text)

Disk Cache

Avoid re-chunking the same document on repeated runs:

from chunkrank import ChunkCache, Chunker, ChunkerConfig

cache = ChunkCache(".chunkrank_cache")
chunks = cache.get(text, model="gpt-4o-mini")
if chunks is None:
    chunks = Chunker(ChunkerConfig(model="gpt-4o-mini")).split(text)
    cache.set(text, model="gpt-4o-mini", chunks=chunks)

Runtime Model Registration

Register new models without editing the registry JSON:

import chunkrank

chunkrank.register_model("my-custom-model", max_context=200_000)

Supported Models

54 models in the built-in registry, including:

Provider Models
OpenAI gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo, o1, o3, o3-mini, o4-mini
Anthropic claude-3-opus/sonnet/haiku, claude-3-5-sonnet/haiku, claude-sonnet-4-6, claude-opus-4-6
Google gemini-1.5-pro, gemini-1.5-flash, gemini-2.0-flash, gemini-2.5-pro
Meta Llama-3.1/3.2/3.3, Llama-4-Scout (10M ctx), Llama-4-Maverick
Mistral mistral-7b, mixtral-8x7b, mistral-large, codestral
Cohere command-r, command-r-plus, command-r7b
DeepSeek deepseek-v3, deepseek-r1
Qwen qwen2.5-72b-instruct, qwen2.5-coder-32b-instruct

Unknown models fall back to 128k context with tiktoken (o200k_base).


How It Fits

Tool What it does
LangChain / LlamaIndex Full RAG pipelines
Haystack End-to-end retrieval frameworks
ChunkRank Focused, model-aware chunking + answer ranking

ChunkRank complements RAG frameworks — it doesn't replace them.


Requirements

  • Python 3.10+
  • numpy, scikit-learn, rank-bm25

License

Apache 2.0 — see LICENCE.


Community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chunkrank-1.1.3.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chunkrank-1.1.3-py3-none-any.whl (21.7 kB view details)

Uploaded Python 3

File details

Details for the file chunkrank-1.1.3.tar.gz.

File metadata

  • Download URL: chunkrank-1.1.3.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.14.3 Darwin/25.3.0

File hashes

Hashes for chunkrank-1.1.3.tar.gz
Algorithm Hash digest
SHA256 7394439c99c3876b50b7bcf5110ec064ad1caa5087a5651eb6f3ff5c64d943c8
MD5 4e0133e377ebc427b6a7dc2bb8e9e60e
BLAKE2b-256 1c4a52950a84f367520f91f1040ee1029a1c8de4018ccc3b7a03d51b05eeb657

See more details on using hashes here.

File details

Details for the file chunkrank-1.1.3-py3-none-any.whl.

File metadata

  • Download URL: chunkrank-1.1.3-py3-none-any.whl
  • Upload date:
  • Size: 21.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.14.3 Darwin/25.3.0

File hashes

Hashes for chunkrank-1.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9756dea68ac5dfb0cd2e0c3867de215d25f5f275c6157f23bf75118ddf6ab095
MD5 9c12098b41757d4e92113eb555ddb83f
BLAKE2b-256 8a2e3c5cc29fc43a451fbba06b2c6cf3722efcfaf6c85b3371eb84cd5958a00d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page