llama-index node_parser docling integration
Project description
Docling Node Parser
Overview
Docling Node Parser parses Docling JSON output into LlamaIndex nodes with rich metadata for usage in downstream pipelines for RAG / QA etc.
Installation
pip install llama-index-node-parser-docling
Usage
Docling Node Parser parses LlamaIndex documents containing JSON-serialized Docling format, as created by a Docling Reader.
Basic usage looks like this:
# docs = ... # e.g. created using Docling Reader in JSON mode
from llama_index.node_parser.docling import DoclingNodeParser
node_parser = DoclingNodeParser()
nodes = node_parser.get_nodes_from_documents(documents=docs)
print(f"{nodes[6].text[:70]}...")
# > Docling provides an easy code interface to convert PDF documents from ...
print(nodes[6].metadata)
# > {'dl_doc_hash': '556ad9e23b...',
# > 'path': '#/main-text/22',
# > 'heading': '2 Getting Started',
# > 'page': 2,
# > 'bbox': [107.40, 456.93, 504.20, 499.65]}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for llama_index_node_parser_docling-0.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 52b1b18e2980e2626e12629fd06b3c282193d2d53d69df5678e243fa978c8bcd |
|
MD5 | 7437215414b55271f96d53a9245e6784 |
|
BLAKE2b-256 | 719dbca3ed005089d25409d5b6acbdead3950f826119703a640960788ec50685 |
Close
Hashes for llama_index_node_parser_docling-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b24bec1737b7fdb60ae81b5c92e07c65a0427c265e74f31be1f4716f9cc8bc57 |
|
MD5 | b793d5840c6721e73d27eec23d18d477 |
|
BLAKE2b-256 | 88f6452294fa9e0e1b9fd7fa9e9cb5c9e39bd4cfe68ccac3dc25102c3a75957f |