Skip to main content

llama-index node_parser docling integration

Project description

Docling Node Parser

Overview

Docling Node Parser parses Docling JSON output into LlamaIndex nodes with rich metadata for usage in downstream pipelines for RAG / QA etc.

Installation

pip install llama-index-node-parser-docling

Usage

Docling Node Parser parses LlamaIndex documents containing JSON-serialized Docling format, as created by a Docling Reader.

Basic usage looks like this:

# docs = ...  # e.g. created using Docling Reader in JSON mode

from llama_index.node_parser.docling import DoclingNodeParser

node_parser = DoclingNodeParser()
nodes = node_parser.get_nodes_from_documents(documents=docs)
print(f"{nodes[12].text[:70]}...")
# > Docling provides an easy code interface to convert PDF documents from ...

print(nodes[12].metadata)
# > {'doc_items': [
# >    'self_ref': '#/main-text/21',
# >    'prov': [
# >      'page_no': 2,
# >      'bbox': {'l': 107.3, 't': 499.5, 'r': 504.0, 'b': 456.7, ...},
# >      ...
# >  ],
# >  'headings': ['2 Getting Started'],
# >  ...
# > }

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_node_parser_docling-0.3.2.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file llama_index_node_parser_docling-0.3.2.tar.gz.

File metadata

File hashes

Hashes for llama_index_node_parser_docling-0.3.2.tar.gz
Algorithm Hash digest
SHA256 380fe944bce73cecd3ab30427af0e824829f89254392ee5cbfdc618165f2a27d
MD5 af92403427c5e74d836ff45b01d2132e
BLAKE2b-256 20db16f07c8af509f57ce787a2d2800254a78d43cf0aad7c1bb6a514e6dc687b

See more details on using hashes here.

File details

Details for the file llama_index_node_parser_docling-0.3.2-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_node_parser_docling-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2079272a9bcb4db950457a4ce44a0c74949fc0d0815877794454d8854df86173
MD5 2f3e156ac8fc10a24518a5ca2aa559e2
BLAKE2b-256 eecb274af0ce9105941290ec21ddb0ed84d4dd00f8c7d3042dd20663fb368599

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page