Skip to main content

llama-index Chonkie integration

Project description

LlamaIndex Ingestion Chonkie Integration

This package provides an integration between LlamaIndex and Chonkie, a powerful and flexible chunking library.

Installation

pip install llama-index-ingestion-chonkie

Quick Start

from llama_index.core import Document
from llama_index.ingestion.chonkie import Chunker

# Create a chunker (defaults to 'recursive')
chunker = Chunker(chunk_size=512, chunk_overlap=50)

# Create a document
doc = Document(text="Your long text here...")

# Get nodes
nodes = chunker.get_nodes_from_documents([doc])

Supported Chunkers

The Chunker acts as a wrapper for various Chonkie chunking strategies. You can specify the strategy using the chunker_type parameter:

chunker_type Description
recursive (Default) Recursively splits text based on a hierarchy of separators.
sentence Splits text into sentences.
token Splits text into chunks based on token counts.
word Splits text based on word counts.
semantic Splits text based on semantic similarity.
late Late chunking strategy.
neural Neural-based chunking.
code Optimized for source code.
fast High-performance basic chunking.

Advanced Configuration

You can pass any keyword arguments accepted by the underlying Chonkie chunker directly to Chunker:

chunker = Chunker(
    chunker_type="semantic",
    chunk_size=512,
    embedding_model="all-MiniLM-L6-v2",
    threshold=0.5,
)

Integration with IngestionPipeline

from llama_index.core import Document
from llama_index.core.ingestion import IngestionPipeline
from llama_index.ingestion.chonkie import Chunker

pipeline = IngestionPipeline(
    transformations=[
        Chunker(chunker_type="recursive", chunk_size=512),
        # ... other transformations
    ]
)

nodes = pipeline.run(documents=[Document.example()])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_node_parser_chonkie-0.1.0.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file llama_index_node_parser_chonkie-0.1.0.tar.gz.

File metadata

  • Download URL: llama_index_node_parser_chonkie-0.1.0.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_node_parser_chonkie-0.1.0.tar.gz
Algorithm Hash digest
SHA256 573f233a26ebc7fa265b0bc858ce215ada5e7cc9222655af2ec6c99403e25542
MD5 86d2fdff6899e6c05475bc6e7c79c059
BLAKE2b-256 f9402a3f6fae70d9262097a0f6a92e7eee6e991f02440561ec20f4e84be6abed

See more details on using hashes here.

File details

Details for the file llama_index_node_parser_chonkie-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: llama_index_node_parser_chonkie-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_node_parser_chonkie-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5e5be008bcae6a9e9f2004a710b41ffc6b9e7e066a4a223484d5d457aa6b9095
MD5 ca053469e4b0dee4383a37f14cdcbd9f
BLAKE2b-256 d00d10b590f419a2156c400204d4eba0e65d0a6ca024121923fd596f88a03800

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page