Skip to main content

llama-index node_parser docling integration

Project description

Docling Node Parser

Overview

Docling Node Parser parses Docling JSON output into LlamaIndex nodes with rich metadata for usage in downstream pipelines for RAG / QA etc.

Installation

pip install llama-index-node-parser-docling

Usage

Docling Node Parser parses LlamaIndex documents containing JSON-serialized Docling format, as created by a Docling Reader.

Basic usage looks like this:

# docs = ...  # e.g. created using Docling Reader in JSON mode

from llama_index.node_parser.docling import DoclingNodeParser

node_parser = DoclingNodeParser()
nodes = node_parser.get_nodes_from_documents(documents=docs)
print(f"{nodes[6].text[:70]}...")
# > Docling provides an easy code interface to convert PDF documents from ...

print(nodes[6].metadata)
# > {'dl_doc_hash': '556ad9e23b...',
# >  'path': '#/main-text/22',
# >  'heading': '2 Getting Started',
# >  'page': 2,
# >  'bbox': [107.40, 456.93, 504.20, 499.65]}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_node_parser_docling-0.1.0.tar.gz (3.1 kB view hashes)

Uploaded Source

Built Distribution

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page