Semantic text chunking using Finetuned ModernBERT with recursive splitting and long text support.

Project description

fine-chunker 🚀

Semantic text chunking using a fine-tuned ModernBERT model with recursive splitting and support for extremely long documents.

This library divides your text into meaningful segments based on semantic boundaries rather than just character counts or newline characters. It uses a token-classification approach where the model predicts the ideal points to "cut" the text.

Library and model are still in early development, so expect some rough edges.

Key Features

Fine-tuned ModernBERT: Uses a finetunned ModernBERT encoder model optimized for semantic boundaries. More details about models are provided at: jboksa/modbert-chunker-base
Recursive Splitting: Automatically drills down into large chunks with decreasing thresholds to ensure everything fits your target size while remaining semantically coherent.
Long Text Support: Implements an intelligent sliding window system to process documents of any length (books, reports, etc.) without losing context.
Hugging Face Integration: Zero configuration required - models and tokenizers are fetched automatically from the Hub.
Hardware Agnostic: Runs smoothly on CUDA (GPU) or CPU.

Installation

Basic Installation

To install the fine-chunker package, you can use pip:

pip install fine-chunker

Or using uv:

uv add fine-chunker

Optional Dependencies

Depending on your use case, you may want to install additional dependencies:

With PyTorch (GPU support): If you plan to use PyTorch with GPU support, install the package with the torch extras:
```
pip install fine-chunker[torch]
```
With PyTorch (CPU-only): If you plan to use PyTorch but only need CPU support, install the package with the torch-cpu extras:
```
pip install fine-chunker[torch-cpu]
```
With ONNX Runtime: If you plan to use ONNX for inference, install the package with the onnx extras:
```
pip install fine-chunker[onnx]
```

Development Installation

If you want to contribute to the development of fine-chunker, you can install the package with development dependencies:

pip install fine-chunker[dev]

This will include tools for building, testing, and debugging the package.

Quick Start

from fine_chunker import Chunker

text = """
1 Introduction Recurrent neural networks, long short-term memory [13] and gated recurrent [7] neural networks in particular, have been firmly established as state of the art approaches in sequence modeling and transduction problems such as language modeling and machine translation [35, 2, 5]. Numerous efforts have since continued to push the boundaries of recurrent language models and encoder-decoder architectures [38, 24, 15]. Recurrent models typically factor computation along the symbol positions of the input and output sequences. Aligning the positions to steps in computation time, they generate a sequence of hidden states ht, as a function of the previous hidden state ht−1 and the input for position t. This inherently sequential nature precludes parallelization within training examples, which becomes critical at longer sequence lengths, as memory constraints limit batching across examples. Recent work has achieved significant improvements in computational efficiency through factorization tricks [21] and conditional computation [32], while also improving model performance in case of the latter. The fundamental constraint of sequential computation, however, remains. Attention mechanisms have become an integral part of compelling sequence modeling and transduction models in various tasks, allowing modeling of dependencies without regard to their distance in the input or output sequences [2, 19]. In all but a few cases [27], however, such attention mechanisms are used in conjunction with a recurrent network. In this work we propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output. The Transformer allows for significantly more parallelization and can reach a new state of the art in translation quality after being trained for as little as twelve hours on eight P100 GPUs.
    """

chunker = Chunker.from_pretrained(device="cpu", use_onnx=True, max_chunk_size=850)
chunks = chunker.chunk(text)

for chunk in chunks:
    print(f"\nChunk {chunk.index} | size={len(chunk.content)}")
    print(chunk.content)

Result:

Chunk 0 | size=431
1 Introduction Recurrent neural networks, long short-term memory [13] and gated recurrent [7] neural networks in particular, have been firmly established as state of the art approaches in sequence modeling and transduction problems such as language modeling and machine translation [35, 2, 5]. Numerous efforts have since continued to push the boundaries of recurrent language models and encoder-decoder architectures [38, 24, 15].

Chunk 1 | size=759
Recurrent models typically factor computation along the symbol positions of the input and output sequences. Aligning the positions to steps in computation time , they generate a sequence of hidden states ht , as a function of the previous hidden state ht −1 and the input for position t. This inherently sequential nature precludes parallelization within training examples, which becomes critical at longer sequence lengths , as memory constraints limit batching across examples. Recent work has achieved significant improvements in computational efficiency through factorization tricks [21] and conditional computation [32], while also improving model performance in case of the latter. The fundamental constraint of sequential computation, however, remains.

Chunk 2 | size=731
Attention mechanisms have become an integral part of compelling sequence modeling and transduction models in various tasks , allowing modeling of dependencies without regard to their distance in the input or output sequences [2, 19]. In all but a few cases [27], however, such attention mechanisms are used in conjunction with a recurrent network. In this work we propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output. The Transformer allows for significantly more parallelization and can reach a new state of the art in translation quality after being trained for as little as twelve hours on eight P100 GPUs.

Advanced Usage

You can fine-tune the chunking behavior using several parameters:

chunker = Chunker.from_pretrained(
    device="cuda",
    threshold_start=0.5,  # Starting sensitivity (higher = fewer chunks)
    threshold_step=0.1,   # How much to lower threshold when a chunk is too big
    max_chunk_size=1000,  # Target maximum characters per chunk
    min_chunk_size=350,   # Minimum characters (merges small fragments)
    max_depth=3           # How many times to try splitting a single big chunk
)

How it Works

Windowing: If the text is extremely long, it's divided into semantic windows of ~8000 tokens.
Prediction: The ModernBERT model identifies "start of chunk" tokens.
Recursive Refinement: If a resulting chunk is larger than max_chunk_size, the library re-scans just that fragment with a lower sensitivity threshold.
Stability Merge: Finally, very small fragments are merged with their neighbors to maintain a consistent chunk size for your RAG or LLM application.

Author

Developed by Jerzy Boksa.

Contact: devjerzy@gmail.com

Model hosted at: jboksa/modbert-chunker-base

Project details

Release history Release notifications | RSS feed

0.1.2

Mar 22, 2026

This version

0.1.1

Mar 21, 2026

0.1.0

Mar 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fine_chunker-0.1.1.tar.gz (87.0 kB view details)

Uploaded Mar 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fine_chunker-0.1.1-py3-none-any.whl (16.7 kB view details)

Uploaded Mar 21, 2026 Python 3

File details

Details for the file fine_chunker-0.1.1.tar.gz.

File metadata

Download URL: fine_chunker-0.1.1.tar.gz
Upload date: Mar 21, 2026
Size: 87.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for fine_chunker-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`7c0e0c6b6da962641453f4dae0295e9f6dcea7817d6442ffd39aec731f19252a`
MD5	`de533ee159fe34f4f35227bbc0c2ac20`
BLAKE2b-256	`17a8612de6bb7be51759dacf76923e6d61f96ff999960701a67e9b084273836d`

See more details on using hashes here.

File details

Details for the file fine_chunker-0.1.1-py3-none-any.whl.

File metadata

Download URL: fine_chunker-0.1.1-py3-none-any.whl
Upload date: Mar 21, 2026
Size: 16.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for fine_chunker-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`626358b9c681988b917020875e9e3c79eea8fda832fd688067bf1bec7de094cd`
MD5	`9b432ef688734dcf4d6fc343bd333051`
BLAKE2b-256	`cc5db62d0447623bf6420166bfb46170d529b2955001769d21b81ac060ee4b8a`

See more details on using hashes here.

fine-chunker 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

fine-chunker 🚀

Key Features

Installation

Optional Dependencies

Development Installation

Quick Start

Advanced Usage

How it Works

Author

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes