Skip to main content

llama-index Chonkie integration

Project description

LlamaIndex Node Parser Chonkie Integration

This package provides an integration between LlamaIndex and Chonkie, a powerful and flexible chunking library.

Installation

pip install llama-index-node_parser-chonkie

Quick Start

from llama_index.core import Document
from llama_index.node_parser.chonkie import Chunker

# Create a chunker (defaults to 'recursive')
chunker = Chunker(chunk_size=512)

# Create a document
doc = Document(text="Your long text here...")

# Get nodes
nodes = chunker.get_nodes_from_documents([doc])

Supported Chunkers

The Chunker acts as a wrapper for various Chonkie chunking strategies. You can specify the strategy using the chunker parameter:

chunker Description
recursive (Default) Recursively splits text based on a hierarchy of separators.
sentence Splits text into sentences.
token Splits text into chunks based on token counts.
word Splits text based on word counts.
semantic Splits text based on semantic similarity.
late Late chunking strategy.
neural Neural-based chunking.
code Optimized for source code.
fast High-performance basic chunking.

run the following code to see the full list of valid aliases:

from llama_index.node_parser import Chunker

print(Chunker.valid_chunkers)

Advanced Configuration

You can pass any keyword arguments accepted by the underlying Chonkie chunker directly to Chunker:

chunker = Chunker(
    chunker="semantic",
    chunk_size=512,
    embedding_model="all-MiniLM-L6-v2",
    threshold=0.5,
)

Integration with Node Parsing

You can use Chunker directly to parse documents into nodes:

from llama_index.core import Document
from llama_index.node_parser.chonkie import Chunker

chunker = Chunker(chunk_size=512)
doc = Document(text="Your long text here...")
nodes = chunker.get_nodes_from_documents([doc])

or you can also use it as a component within the Ingestion pipeline:

from llama_index.core import Document
from llama_index.core.ingestion import IngestionPipeline
from llama_index.node_parser.chonkie import Chunker

pipeline = IngestionPipeline(
    transformations=[
        Chunker("recursive", chunk_size=512),
        # ... other transformations
    ]
)

nodes = pipeline.run(documents=[Document.example()])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_node_parser_chonkie-0.1.1.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file llama_index_node_parser_chonkie-0.1.1.tar.gz.

File metadata

  • Download URL: llama_index_node_parser_chonkie-0.1.1.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_node_parser_chonkie-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b0b725e5d3d0c7071e7eda89bf8690c2c8274826bf7ede78ba3fda08b0913872
MD5 90e912d7116557018649e3615921a845
BLAKE2b-256 767bba2881f766c77855176389bfb47f2e5ac1a3622167c02c5afa645122533a

See more details on using hashes here.

File details

Details for the file llama_index_node_parser_chonkie-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: llama_index_node_parser_chonkie-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 4.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_node_parser_chonkie-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 721dd89cf41153df8db8b10823eda9d918c844bd624805164ffc91bdc7bfa312
MD5 4d7f9df233a9472a787c520eba5e19f2
BLAKE2b-256 7c1c452e5ffaedd0b1a79d13df13bbd3262bd0bd1949e5ad8bad98c1ee6a0a99

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page