Skip to main content

llama-index node_parser docling integration

Project description

Docling Node Parser

Overview

Docling Node Parser parses Docling JSON output into LlamaIndex nodes with rich metadata for usage in downstream pipelines for RAG / QA etc.

Installation

pip install llama-index-node-parser-docling

Usage

Docling Node Parser parses LlamaIndex documents containing JSON-serialized Docling format, as created by a Docling Reader.

Basic usage looks like this:

# docs = ...  # e.g. created using Docling Reader in JSON mode

from llama_index.node_parser.docling import DoclingNodeParser

node_parser = DoclingNodeParser()
nodes = node_parser.get_nodes_from_documents(documents=docs)
print(f"{nodes[12].text[:70]}...")
# > Docling provides an easy code interface to convert PDF documents from ...

print(nodes[12].metadata)
# > {'doc_items': [
# >    'self_ref': '#/main-text/21',
# >    'prov': [
# >      'page_no': 2,
# >      'bbox': {'l': 107.3, 't': 499.5, 'r': 504.0, 'b': 456.7, ...},
# >      ...
# >  ],
# >  'headings': ['2 Getting Started'],
# >  ...
# > }

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_node_parser_docling-0.4.0.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file llama_index_node_parser_docling-0.4.0.tar.gz.

File metadata

File hashes

Hashes for llama_index_node_parser_docling-0.4.0.tar.gz
Algorithm Hash digest
SHA256 dc3d97e6c2806b07891b9c39a9578210c82f28f65551d238fc6b12c01f737d0e
MD5 462cc70fab62d1a16c4d592f618fa58c
BLAKE2b-256 0bcd91d74e5f5e7cd3ffa38847f42bc853ec15015e714a8a83668f615b02c443

See more details on using hashes here.

File details

Details for the file llama_index_node_parser_docling-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_node_parser_docling-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a89db90010f2dfbe13aae91c8cb81e9832b2a722f22068cd21189b83fc4bc7a9
MD5 f9dfbd6b5a794c967b22488cd4f86cc1
BLAKE2b-256 47cede7fdb0a3028e72599b1c7e445023ccf7c246cf6699e1dd416cb9d96aeba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page