Skip to main content

Model-aware text chunking and answer re-ranking for LLM pipelines. Automatically adapts chunk size to tokenizer and context window, then consolidates and ranks answers across chunks.

Project description

ChunkRank: Model-Aware Chunking + Answer Ranking

Used internally for long-document QA and evaluation pipelines handling 1,000+ PDFs.
ChunkRank is a lightweight Python library that automatically chunks 
text based on an LLM’s tokenizer and context window, then consolidates
and ranks answers across chunks. In short ChunkRank is a model-aware text 
chunking and answer re-ranking library for LLM pipelines.

🔗 PyPI : https://pypi.org/project/chunkrank/


Why ChunkRank?

When working with LLMs, long documents must be split into chunks, but:

  • Every model has different tokenizers and context limits
  • Chunk sizes are usually hard-coded and error-prone
  • Answer quality drops when responses come from multiple chunks
  • Existing RAG frameworks are heavy when you only need chunking + ranking

ChunkRank solves this gap.


What It Does

Model-aware chunking

  • Pass a model name (gpt-4o-mini, claude-3.5-sonnet, Llama-3.1-8B etc.)
  • ChunkRank automatically:
    • Selects the correct tokenizer
    • Applies the correct context window
    • Reserves token space for prompts and responses

No manual token math. No trial-and-error.

Answer consolidation & ranking

  • Query runs across multiple chunks
  • Multiple candidate answers are produced
  • ChunkRank re-ranks them to return the best answer Works standalone — no full RAG stack required.

Installation

pip install chunkrank

or for development:

poetry install

Quick Example

from chunkrank import ChunkRankPipeline

text = open("document.txt").read()

pipe = ChunkRankPipeline(model="gpt-4o-mini")

answer = pipe.process(
    question="What is the main topic of this document?",
    text=text
)

print(answer)

Core API

chunks = chunkrank.split(text, model="gpt-4o-mini")

answers = chunkrank.answer(question, chunks)

best_answer = chunkrank.rank(answers)

Supported Capabilities

  • Automatic model → tokenizer → context resolution
  • Token, sentence, and paragraph chunking strategies
  • Cross-encoder based answer re-ranking
  • Works with OpenAI, Anthropic, HF, Llama-based models
  • Drop-in utility for QA, summarization, extraction

How It Fits

Tool What it does
LangChain / LlamaIndex Full RAG pipelines
Haystack End-to-end retrieval frameworks
ChunkRank Focused, model-aware chunking + answer ranking

ChunkRank complements RAG frameworks — it doesn’t replace them.


Roadmap

  1. Build the model registry (model → context window + tokenizer).
  2. Implement chunking strategies (tokens, sentences, paragraphs).
  3. Integrate a re-ranking engine (start with Hugging Face cross-encoder).
  4. Package and release to PyPI with a simple API.

Community


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chunkrank-0.2.3.tar.gz (6.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chunkrank-0.2.3-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file chunkrank-0.2.3.tar.gz.

File metadata

  • Download URL: chunkrank-0.2.3.tar.gz
  • Upload date:
  • Size: 6.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.14.2 Darwin/24.5.0

File hashes

Hashes for chunkrank-0.2.3.tar.gz
Algorithm Hash digest
SHA256 eff9a42191cf69fc210b7faf14727ddadc3d97b8144d8c45e9ca194f23a31743
MD5 fef816235e9f025fdf56483ea9a8a8d1
BLAKE2b-256 1eece09fa930daaac5329cfedec33b26aca53beb7a0b096842a1266eec68d0e8

See more details on using hashes here.

File details

Details for the file chunkrank-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: chunkrank-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 9.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.14.2 Darwin/24.5.0

File hashes

Hashes for chunkrank-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 8c4ded8deb9b7326cbbdf252554c5d2bc9a592c00abd530731a5ff8ea1f9e2a4
MD5 612ba667a1570a4b8f0d21b922223073
BLAKE2b-256 c21ca4e8d626f92d66f0ffaa85449822b5605b9756e357d3ec44d7a11831f750

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page