Skip to main content

llama-index node_parser docling integration

Project description

Docling Node Parser

Overview

Docling Node Parser parses Docling JSON output into LlamaIndex nodes with rich metadata for usage in downstream pipelines for RAG / QA etc.

Installation

pip install llama-index-node-parser-docling

Usage

Docling Node Parser parses LlamaIndex documents containing JSON-serialized Docling format, as created by a Docling Reader.

Basic usage looks like this:

# docs = ...  # e.g. created using Docling Reader in JSON mode

from llama_index.node_parser.docling import DoclingNodeParser

node_parser = DoclingNodeParser()
nodes = node_parser.get_nodes_from_documents(documents=docs)
print(f"{nodes[12].text[:70]}...")
# > Docling provides an easy code interface to convert PDF documents from ...

print(nodes[12].metadata)
# > {'doc_items': [
# >    'self_ref': '#/main-text/21',
# >    'prov': [
# >      'page_no': 2,
# >      'bbox': {'l': 107.3, 't': 499.5, 'r': 504.0, 'b': 456.7, ...},
# >      ...
# >  ],
# >  'headings': ['2 Getting Started'],
# >  ...
# > }

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_node_parser_docling-0.4.1.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file llama_index_node_parser_docling-0.4.1.tar.gz.

File metadata

File hashes

Hashes for llama_index_node_parser_docling-0.4.1.tar.gz
Algorithm Hash digest
SHA256 d4d6aeaeef73a07694bc388043664fc065e3c0b423db3b41682a441ecaf761f3
MD5 d6a7ff89f3dc715d6742b391f0c9c0fd
BLAKE2b-256 51cd971cd3f1b4aa99ab743463d12d19338d8b4fdec069aa3e20e857b3be1204

See more details on using hashes here.

File details

Details for the file llama_index_node_parser_docling-0.4.1-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_node_parser_docling-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5a3e976d8b96e0e517be5775c454a5244cf09cd54ae0c44478440e7639f7749a
MD5 6cdc4ac5473faef2b96db5d420dc139f
BLAKE2b-256 2e84e1671271e06f795e2a6979e0c929bbd8a0d37cd94fbc93102057c877832d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page