Element-aware LlamaIndex node parser for kreuzberg-extracted documents

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

v-tan

These details have not been verified by PyPI

Project links

Kreuzberg

Project description

LlamaIndex Node Parser Kreuzberg

Element-aware LlamaIndex node parser for kreuzberg-extracted documents.

Installation

pip install llama-index-node-parser-kreuzberg

Requires llama-index-core>=0.13.0,<0.15. This package does not depend on kreuzberg directly — the kreuzberg package is a dependency of the reader (llama-index-readers-kreuzberg), which is needed for producing documents with element metadata.

Prerequisites

This parser requires documents with _kreuzberg_elements metadata. These are produced by KreuzbergReader configured with element-based extraction. Install llama-index-readers-kreuzberg (which brings in kreuzberg) to use the full workflow.

from kreuzberg import ExtractionConfig
from llama_index.readers.kreuzberg import KreuzbergReader

reader = KreuzbergReader(
    extraction_config=ExtractionConfig(result_format="element_based")
)
documents = reader.load_data("report.pdf")

Features

Element-aware splitting — headings, paragraphs, tables, and code blocks each become a node
Element type metadata preserved on each node (element_type, page_number, element_index)
Source document relationships tracked via NodeRelationship.SOURCE
Graceful degradation — documents without elements pass through with a warning
Composes with other transformations (e.g., SentenceSplitter for further chunking)
Async support via aget_nodes_from_documents
Serialization support (to_dict / from_dict)

Usage

Basic

Full reader-to-nodes flow:

from kreuzberg import ExtractionConfig
from llama_index.readers.kreuzberg import KreuzbergReader
from llama_index.node_parser.kreuzberg import KreuzbergNodeParser

reader = KreuzbergReader(
    extraction_config=ExtractionConfig(result_format="element_based")
)
documents = reader.load_data("report.pdf")

parser = KreuzbergNodeParser()
nodes = parser.get_nodes_from_documents(documents)

IngestionPipeline

Chain with SentenceSplitter for further chunking of large elements:

from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.node_parser import SentenceSplitter

pipeline = IngestionPipeline(
    transformations=[
        KreuzbergNodeParser(),
        SentenceSplitter(chunk_size=512),  # Further split large elements
    ]
)
nodes = pipeline.run(documents=documents)

VectorStoreIndex

Using the transformations parameter:

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(
    documents,
    transformations=[KreuzbergNodeParser()],
)

Async

nodes = await parser.aget_nodes_from_documents(documents)

Behavior Notes

Documents without _kreuzberg_elements metadata pass through unchanged with a warning. This is intentional — silently falling back would prevent users from noticing they are not getting element-aware splitting.
Empty or whitespace-only elements are automatically skipped.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

v-tan

These details have not been verified by PyPI

Project links

Kreuzberg

Release history Release notifications | RSS feed

This version

0.1.0

Mar 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_node_parser_kreuzberg-0.1.0.tar.gz (4.4 kB view details)

Uploaded Mar 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llama_index_node_parser_kreuzberg-0.1.0-py3-none-any.whl (4.9 kB view details)

Uploaded Mar 21, 2026 Python 3

File details

Details for the file llama_index_node_parser_kreuzberg-0.1.0.tar.gz.

File metadata

Download URL: llama_index_node_parser_kreuzberg-0.1.0.tar.gz
Upload date: Mar 21, 2026
Size: 4.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llama_index_node_parser_kreuzberg-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6941aacb7e44ff8119ab6c8566984623b0eb5e1abff8b47b975bf5050ac39b86`
MD5	`c3af80166a425e9aa736604b7eb64849`
BLAKE2b-256	`1be8debc70a91133ffaf5218a5c439c7a65188c5063134631ce94a79d88e50ed`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llama_index_node_parser_kreuzberg-0.1.0.tar.gz:

Publisher: publish-node-parser.yaml on kreuzberg-dev/llama-index-kreuzberg

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llama_index_node_parser_kreuzberg-0.1.0.tar.gz
- Subject digest: 6941aacb7e44ff8119ab6c8566984623b0eb5e1abff8b47b975bf5050ac39b86
- Sigstore transparency entry: 1154401024
- Sigstore integration time: Mar 21, 2026
Source repository:
- Permalink: kreuzberg-dev/llama-index-kreuzberg@7084e3b62befa664f4c159f1951430c03d3b744f
- Branch / Tag: refs/tags/node-parser-v0.1.0
- Owner: https://github.com/kreuzberg-dev
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-node-parser.yaml@7084e3b62befa664f4c159f1951430c03d3b744f
- Trigger Event: push

File details

Details for the file llama_index_node_parser_kreuzberg-0.1.0-py3-none-any.whl.

File metadata

Download URL: llama_index_node_parser_kreuzberg-0.1.0-py3-none-any.whl
Upload date: Mar 21, 2026
Size: 4.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llama_index_node_parser_kreuzberg-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3b768fe93b9fd0f456a780dedb733e89c01441753e815ed55017af981367a8fa`
MD5	`de3521d6100ad6662afdc131cafb3c06`
BLAKE2b-256	`e2b2584f46cbafd0abaa99fbeb83559d9266398a6ad11543745ec903fbcaa84a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llama_index_node_parser_kreuzberg-0.1.0-py3-none-any.whl:

Publisher: publish-node-parser.yaml on kreuzberg-dev/llama-index-kreuzberg

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llama_index_node_parser_kreuzberg-0.1.0-py3-none-any.whl
- Subject digest: 3b768fe93b9fd0f456a780dedb733e89c01441753e815ed55017af981367a8fa
- Sigstore transparency entry: 1154401026
- Sigstore integration time: Mar 21, 2026
Source repository:
- Permalink: kreuzberg-dev/llama-index-kreuzberg@7084e3b62befa664f4c159f1951430c03d3b744f
- Branch / Tag: refs/tags/node-parser-v0.1.0
- Owner: https://github.com/kreuzberg-dev
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-node-parser.yaml@7084e3b62befa664f4c159f1951430c03d3b744f
- Trigger Event: push

llama-index-node-parser-kreuzberg 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LlamaIndex Node Parser Kreuzberg

Installation

Prerequisites

Features

Usage

Basic

IngestionPipeline

VectorStoreIndex

Async

Behavior Notes

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance