Skip to main content

llama-index Chonkie integration

Project description

LlamaIndex Node Parser Chonkie Integration

This package provides an integration between LlamaIndex and Chonkie, a powerful and flexible chunking library.

Installation

pip install llama-index-node_parser-chonkie

Quick Start

from llama_index.core import Document
from llama_index.node_parser.chonkie import Chunker

# Create a chunker (defaults to 'recursive')
chunker = Chunker(chunk_size=512)

# Create a document
doc = Document(text="Your long text here...")

# Get nodes
nodes = chunker.get_nodes_from_documents([doc])

Supported Chunkers

The Chunker acts as a wrapper for various Chonkie chunking strategies. You can specify the strategy using the chunker parameter:

chunker Description
recursive (Default) Recursively splits text based on a hierarchy of separators.
sentence Splits text into sentences.
token Splits text into chunks based on token counts.
word Splits text based on word counts.
semantic Splits text based on semantic similarity.
late Late chunking strategy.
neural Neural-based chunking.
code Optimized for source code.
fast High-performance basic chunking.

run the following code to see the full list of valid aliases:

from llama_index.node_parser import Chunker

print(Chunker.valid_chunkers)

Advanced Configuration

You can pass any keyword arguments accepted by the underlying Chonkie chunker directly to Chunker:

chunker = Chunker(
    chunker="semantic",
    chunk_size=512,
    embedding_model="all-MiniLM-L6-v2",
    threshold=0.5,
)

Integration with Node Parsing

You can use Chunker directly to parse documents into nodes:

from llama_index.core import Document
from llama_index.node_parser.chonkie import Chunker

chunker = Chunker(chunk_size=512)
doc = Document(text="Your long text here...")
nodes = chunker.get_nodes_from_documents([doc])

or you can also use it as a component within the Ingestion pipeline:

from llama_index.core import Document
from llama_index.core.ingestion import IngestionPipeline
from llama_index.node_parser.chonkie import Chunker

pipeline = IngestionPipeline(
    transformations=[
        Chunker("recursive", chunk_size=512),
        # ... other transformations
    ]
)

nodes = pipeline.run(documents=[Document.example()])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_node_parser_chonkie-0.1.2.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file llama_index_node_parser_chonkie-0.1.2.tar.gz.

File metadata

  • Download URL: llama_index_node_parser_chonkie-0.1.2.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_node_parser_chonkie-0.1.2.tar.gz
Algorithm Hash digest
SHA256 9ea75e932c6bbb3cb14b307829a13a7ef9e2618ac5dda89a6ff5f3c52d558dda
MD5 4f037902fcb511fa81bec7fa2e7b9fcc
BLAKE2b-256 32fc1e599d63d5bb70aa617ed651d9321525e299682b5f48c1f266abe066bfe8

See more details on using hashes here.

File details

Details for the file llama_index_node_parser_chonkie-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: llama_index_node_parser_chonkie-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 4.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_node_parser_chonkie-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3911929cdcde36d2c2ea3bfe19045acfe210320a8785934ec52bc0874a2c831c
MD5 2c4b1ca808b3d54ac590d445bfc89bfb
BLAKE2b-256 b3adab40b42e91d75039a9a62b1383d50baa41225d8e32447dafa05c6c57d4f4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page