Skip to main content

llama-index node_parser slide node parser integration

Project description

LlamaIndex Node_Parser Integration: SlideNodeParser

Implements the SLIDE node parser described in the paper SLIDE: Sliding Localized Information for Document Extraction, which introduces a chunking strategy that enriches segments with localized context from neighboring text. This improves downstream retrieval and question-answering tasks by preserving important contextual signals that might be lost with naive splitting.

SlideNodeParser implements a faithful adaptation of this technique using LLMs to generate a short context for each chunk based on its surrounding window.

Here's a summary of the method from the paper:

Traditional document chunking methods often truncate local context, weakening the semantic integrity of each chunk.
SLIDE introduces a sliding window approach that augments each chunk with a compact, LLM-generated summary of its surrounding context.

The process begins by greedily grouping sentences into base chunks based on a target token limit.
Then, for each chunk, a sliding window of neighboring chunks is selected (e.g., 5 before and after),
and the LLM is prompted to generate a brief context that situates the chunk within the overall document.

This context is then attached as metadata to each chunk, improving the quality of retrieval and generation tasks downstream, especially in Graph Retrieval Augmented generartion systems

Results from the research paper show that a single glean of the SlideNodeParser with default values (chunk_size=1200tokens, window_size=11) results in the identification of 37% more entities and relationships than standard Node Parsers

Installation

pip install llama-index-node-parser-slide

Usage

from llama_index.core import Document
from llama_index.node_parser.slide import SlideNodeParser

# — Synchronous usage —
parser = SlideNodeParser.from_defaults(
    llm=llm,
    chunk_size=800,
    window_size=5,
)
nodes = parser.get_nodes_from_documents(
    [
        Document(text="document text 1"),
        Document(text="document text 2"),
    ]
)

# — Asynchronous usage (for parallel LLM calls) —
# Specify llm_workers > 1 to run multiple LLM calls concurrently
parser = SlideNodeParser.from_defaults(
    llm=llm,
    chunk_size=800,
    window_size=5,
    llm_workers=2,
)
nodes = await parser.aget_nodes_from_documents(
    [
        Document(text="document text 1"),
        Document(text="document text 2"),
    ]
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_node_parser_slide-0.1.0.tar.gz (7.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file llama_index_node_parser_slide-0.1.0.tar.gz.

File metadata

File hashes

Hashes for llama_index_node_parser_slide-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e95127143cba3c8f334feff1cc0540f49670dddfdfd2440afcf6752f6bfd3c87
MD5 499f66a02bd15c9761ae55bb53306d81
BLAKE2b-256 537fd49d9741aa3bc9142b61a75ceb7c301b2b8f4e42ed3818190bd18dda0776

See more details on using hashes here.

File details

Details for the file llama_index_node_parser_slide-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_node_parser_slide-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 07f511bfe52958a021eedb345ceef5abe5b0be5fa211a080c34609903df23e7a
MD5 5d7ee55f5d5e18c1f06d0a6507a7a917
BLAKE2b-256 2eb012f1a07b8f2c4818365faad8b4e973d3bc141bac79d4f010fbdee36ba705

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page