Skip to main content

llama-index node_parser slide node parser integration

Project description

LlamaIndex Node_Parser Integration: SlideNodeParser

Implements the SLIDE node parser described in the paper SLIDE: Sliding Localized Information for Document Extraction, which introduces a chunking strategy that enriches segments with localized context from neighboring text. This improves downstream retrieval and question-answering tasks by preserving important contextual signals that might be lost with naive splitting.

SlideNodeParser implements a faithful adaptation of this technique using LLMs to generate a short context for each chunk based on its surrounding window.

Here's a summary of the method from the paper:

Traditional document chunking methods often truncate local context, weakening the semantic integrity of each chunk.
SLIDE introduces a sliding window approach that augments each chunk with a compact, LLM-generated summary of its surrounding context.

The process begins by greedily grouping sentences into base chunks based on a target token limit.
Then, for each chunk, a sliding window of neighboring chunks is selected (e.g., 5 before and after),
and the LLM is prompted to generate a brief context that situates the chunk within the overall document.

This context is then attached as metadata to each chunk, improving the quality of retrieval and generation tasks downstream, especially in Graph Retrieval Augmented generartion systems

Results from the research paper show that a single glean of the SlideNodeParser with default values (chunk_size=1200tokens, window_size=11) results in the identification of 37% more entities and relationships than standard Node Parsers

Installation

pip install llama-index-node-parser-slide

Usage

from llama_index.core import Document
from llama_index.node_parser.slide import SlideNodeParser

# — Synchronous usage —
parser = SlideNodeParser.from_defaults(
    llm=llm,
    chunk_size=800,
    window_size=5,
)
nodes = parser.get_nodes_from_documents(
    [
        Document(text="document text 1"),
        Document(text="document text 2"),
    ]
)

# — Asynchronous usage (for parallel LLM calls) —
# Specify llm_workers > 1 to run multiple LLM calls concurrently
parser = SlideNodeParser.from_defaults(
    llm=llm,
    chunk_size=800,
    window_size=5,
    llm_workers=2,
)
nodes = await parser.aget_nodes_from_documents(
    [
        Document(text="document text 1"),
        Document(text="document text 2"),
    ]
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_node_parser_slide-0.2.0.tar.gz (7.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llama_index_node_parser_slide-0.2.0-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file llama_index_node_parser_slide-0.2.0.tar.gz.

File metadata

File hashes

Hashes for llama_index_node_parser_slide-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b5e3129d62dc352e563c69dac96d98dd1559b8515a23f9e3ff43a8ec68f0ae6a
MD5 7a40df0b0e6591015d8382158962772f
BLAKE2b-256 461023f2e6fa5d2476bc912e474859763602812d869b89c0cc3df38a3e15af85

See more details on using hashes here.

File details

Details for the file llama_index_node_parser_slide-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_node_parser_slide-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6ce87c0f237b71afb072a7e02a2e5c0a2f4e1e64c59dc094a4a0ecd46d31a4a7
MD5 3540f8f6e1e1be995d53f3bc6cbb96fb
BLAKE2b-256 8a373b36c25ab29e614bb2f9c89c24a19d131b4a027da7a9f1f30ed02c5962a8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page