llama-index node_parser slide node parser integration
Project description
LlamaIndex Node_Parser Integration: SlideNodeParser
Implements the SLIDE node parser described in the paper SLIDE: Sliding Localized Information for Document Extraction, which introduces a chunking strategy that enriches segments with localized context from neighboring text. This improves downstream retrieval and question-answering tasks by preserving important contextual signals that might be lost with naive splitting.
SlideNodeParser implements a faithful adaptation of this technique using LLMs to generate a short context for each chunk based on its surrounding window.
Here's a summary of the method from the paper:
Traditional document chunking methods often truncate local context, weakening the semantic integrity of each chunk.
SLIDE introduces a sliding window approach that augments each chunk with a compact, LLM-generated summary of its surrounding context.
The process begins by greedily grouping sentences into base chunks based on a target token limit.
Then, for each chunk, a sliding window of neighboring chunks is selected (e.g., 5 before and after),
and the LLM is prompted to generate a brief context that situates the chunk within the overall document.
This context is then attached as metadata to each chunk, improving the quality of retrieval and generation tasks downstream, especially in Graph Retrieval Augmented generartion systems
Results from the research paper show that a single glean of the SlideNodeParser with default values (chunk_size=1200tokens, window_size=11) results in the identification of 37% more entities and relationships than standard Node Parsers
Installation
pip install llama-index-node-parser-slide
Usage
from llama_index.core import Document
from llama_index.node_parser.slide import SlideNodeParser
# — Synchronous usage —
parser = SlideNodeParser.from_defaults(
llm=llm,
chunk_size=800,
window_size=5,
)
nodes = parser.get_nodes_from_documents(
[
Document(text="document text 1"),
Document(text="document text 2"),
]
)
# — Asynchronous usage (for parallel LLM calls) —
# Specify llm_workers > 1 to run multiple LLM calls concurrently
parser = SlideNodeParser.from_defaults(
llm=llm,
chunk_size=800,
window_size=5,
llm_workers=2,
)
nodes = await parser.aget_nodes_from_documents(
[
Document(text="document text 1"),
Document(text="document text 2"),
]
)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llama_index_node_parser_slide-0.2.0.tar.gz.
File metadata
- Download URL: llama_index_node_parser_slide-0.2.0.tar.gz
- Upload date:
- Size: 7.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b5e3129d62dc352e563c69dac96d98dd1559b8515a23f9e3ff43a8ec68f0ae6a
|
|
| MD5 |
7a40df0b0e6591015d8382158962772f
|
|
| BLAKE2b-256 |
461023f2e6fa5d2476bc912e474859763602812d869b89c0cc3df38a3e15af85
|
File details
Details for the file llama_index_node_parser_slide-0.2.0-py3-none-any.whl.
File metadata
- Download URL: llama_index_node_parser_slide-0.2.0-py3-none-any.whl
- Upload date:
- Size: 6.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6ce87c0f237b71afb072a7e02a2e5c0a2f4e1e64c59dc094a4a0ecd46d31a4a7
|
|
| MD5 |
3540f8f6e1e1be995d53f3bc6cbb96fb
|
|
| BLAKE2b-256 |
8a373b36c25ab29e614bb2f9c89c24a19d131b4a027da7a9f1f30ed02c5962a8
|