Model-Aware Chunking + Answer Ranking

These details have not been verified by PyPI

Project description

ChunkRank: Model-Aware Chunking + Answer Ranking

Problem

When using LLMs, text often exceeds the model’s context window.
To handle this, text must be chunked into pieces that fit the model’s maximum token length.

Two challenges arise:

Model-aware chunking
Each model (OpenAI, Anthropic, Llama, Gemini, t5, Bert, BigBert, LangBert etc.) has a different context length and tokenizer.
Current libraries require users to manually configure chunk sizes; no unified library automatically adapts to the chosen model.
Answer consolidation & ranking
Once text is chunked, a query may return multiple answers from different chunks.
A ranking step is needed to decide the best, most relevant answer.
Existing solutions (e.g., RAG frameworks) combine retrieval + generation, but there’s no standalone library that couples chunking and answer re-ranking.

Existing Libraries & Gaps

Chunking

LangChain Text Splitters → Token-based, works with tiktoken, but requires manual chunk size config.
LlamaIndex TokenTextSplitter → Similar functionality, manual sizing.
Haystack PreProcessor → Can split by tokens, overlap supported, but not model-aware by default.
semantic-text-splitter / semchunk → Standalone, supports tiktoken/HF tokenizers, still needs user-specified chunk length.

Gap: None of these libraries automatically map a model → tokenizer → context window → chunk size.

Ranking

pygaggle (Waterloo CAST) → neural re-ranker.
Tevatron → dense retrieval + re-ranking toolkit.
Pyserini (with pygaggle) → BM25 + neural re-rankers.
Haystack, LlamaIndex → include ranking in RAG pipelines.

Gap: Ranking exists, but not combined with chunking in a single, simple package.

What We Want to Build

A standalone Python library that:

Model-Aware Chunking
- User specifies a model name (e.g., gpt-4o-mini, claude-3.5-sonnet, Llama-3.1-8B).
- Library looks up the model’s max context window and tokenizer.
- Automatically chunks text into model-compatible pieces with optional overlap and reserve space.
Answer Consolidation & Ranking
- Given multiple answers from chunks, apply a re-ranking step to select the best one.
- Should integrate with existing ranking models (cross-encoder, bi-encoder, BM25 + re-ranker).
- Should work standalone, without needing a full RAG pipeline.
Unified Workflow
- chunks = chunkrank.split(text, model="gpt-4o-mini")
- answers = chunkrank.answer(question, chunks)
- best = chunkrank.rank(answers)

Vision

Lightweight, model-agnostic utility library.
Bridges the gap between text preparation (chunking) and answer quality (ranking).
Complements existing RAG frameworks but can also work independently.
Easy to drop into pipelines: preprocessing for QA, summarization, or information extraction.

Next Steps

Build the model registry (model → context window + tokenizer).
Implement chunking strategies (tokens, sentences, paragraphs).
Integrate a re-ranking engine (start with Hugging Face cross-encoder).
Package and release to PyPI with a simple API.

Community

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.2.0

Apr 23, 2026

1.1.3

Apr 20, 2026

1.1.2

Apr 20, 2026

1.1.1

Apr 20, 2026

1.1.0

Apr 20, 2026

1.0.0

Mar 10, 2026

0.2.4

Jan 15, 2026

0.2.3

Jan 15, 2026

0.2.2

Jan 15, 2026

0.2.1

Jan 15, 2026

This version

0.2.0

Jan 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chunkrank-0.2.0.tar.gz (6.6 kB view details)

Uploaded Jan 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

chunkrank-0.2.0-py3-none-any.whl (9.1 kB view details)

Uploaded Jan 13, 2026 Python 3

File details

Details for the file chunkrank-0.2.0.tar.gz.

File metadata

Download URL: chunkrank-0.2.0.tar.gz
Upload date: Jan 13, 2026
Size: 6.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.2.1 CPython/3.14.2 Darwin/24.5.0

File hashes

Hashes for chunkrank-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`46983b83d07f2771154357bc70ae4606992cf53f5738a85fe870868613d184ae`
MD5	`626c1bce10a9af7a303d86599337ba69`
BLAKE2b-256	`403b63c4ab22741be4341b28cdfcb6f927ebdf7eed82804edc774941d87d892c`

See more details on using hashes here.

File details

Details for the file chunkrank-0.2.0-py3-none-any.whl.

File metadata

Download URL: chunkrank-0.2.0-py3-none-any.whl
Upload date: Jan 13, 2026
Size: 9.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.2.1 CPython/3.14.2 Darwin/24.5.0

File hashes

Hashes for chunkrank-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e2055291b1c006dac997cb45b472ff009941e1e5c3c18c9c704345273e43daca`
MD5	`92c320982c419905699a8f06801cb2ef`
BLAKE2b-256	`e35ce0682faf6f8cad969d7570d0b5759967045429fa706a1316b5c7eb8b5429`

See more details on using hashes here.

chunkrank 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

ChunkRank: Model-Aware Chunking + Answer Ranking

Problem

Existing Libraries & Gaps

Chunking

Ranking

What We Want to Build

Vision

Next Steps

Community

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes