llama-index Chonkie integration
Project description
LlamaIndex Ingestion Chonkie Integration
This package provides an integration between LlamaIndex and Chonkie, a powerful and flexible chunking library.
Installation
pip install llama-index-ingestion-chonkie
Quick Start
from llama_index.core import Document
from llama_index.ingestion.chonkie import Chunker
# Create a chunker (defaults to 'recursive')
chunker = Chunker(chunk_size=512, chunk_overlap=50)
# Create a document
doc = Document(text="Your long text here...")
# Get nodes
nodes = chunker.get_nodes_from_documents([doc])
Supported Chunkers
The Chunker acts as a wrapper for various Chonkie chunking strategies. You can specify the strategy using the chunker_type parameter:
chunker_type |
Description |
|---|---|
recursive |
(Default) Recursively splits text based on a hierarchy of separators. |
sentence |
Splits text into sentences. |
token |
Splits text into chunks based on token counts. |
word |
Splits text based on word counts. |
semantic |
Splits text based on semantic similarity. |
late |
Late chunking strategy. |
neural |
Neural-based chunking. |
code |
Optimized for source code. |
fast |
High-performance basic chunking. |
Advanced Configuration
You can pass any keyword arguments accepted by the underlying Chonkie chunker directly to Chunker:
chunker = Chunker(
chunker_type="semantic",
chunk_size=512,
embedding_model="all-MiniLM-L6-v2",
threshold=0.5,
)
Integration with IngestionPipeline
from llama_index.core import Document
from llama_index.core.ingestion import IngestionPipeline
from llama_index.ingestion.chonkie import Chunker
pipeline = IngestionPipeline(
transformations=[
Chunker(chunker_type="recursive", chunk_size=512),
# ... other transformations
]
)
nodes = pipeline.run(documents=[Document.example()])
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llama_index_node_parser_chonkie-0.1.0.tar.gz.
File metadata
- Download URL: llama_index_node_parser_chonkie-0.1.0.tar.gz
- Upload date:
- Size: 6.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
573f233a26ebc7fa265b0bc858ce215ada5e7cc9222655af2ec6c99403e25542
|
|
| MD5 |
86d2fdff6899e6c05475bc6e7c79c059
|
|
| BLAKE2b-256 |
f9402a3f6fae70d9262097a0f6a92e7eee6e991f02440561ec20f4e84be6abed
|
File details
Details for the file llama_index_node_parser_chonkie-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llama_index_node_parser_chonkie-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5e5be008bcae6a9e9f2004a710b41ffc6b9e7e066a4a223484d5d457aa6b9095
|
|
| MD5 |
ca053469e4b0dee4383a37f14cdcbd9f
|
|
| BLAKE2b-256 |
d00d10b590f419a2156c400204d4eba0e65d0a6ca024121923fd596f88a03800
|