Skip to main content

llama-index node_parser slide node parser integration

Project description

LlamaIndex Node_Parser Integration: SlideNodeParser

Implements the SLIDE node parser described in the paper SLIDE: Sliding Localized Information for Document Extraction, which introduces a chunking strategy that enriches segments with localized context from neighboring text. This improves downstream retrieval and question-answering tasks by preserving important contextual signals that might be lost with naive splitting.

SlideNodeParser implements a faithful adaptation of this technique using LLMs to generate a short context for each chunk based on its surrounding window.

Here's a summary of the method from the paper:

Traditional document chunking methods often truncate local context, weakening the semantic integrity of each chunk.
SLIDE introduces a sliding window approach that augments each chunk with a compact, LLM-generated summary of its surrounding context.

The process begins by greedily grouping sentences into base chunks based on a target token limit.
Then, for each chunk, a sliding window of neighboring chunks is selected (e.g., 5 before and after),
and the LLM is prompted to generate a brief context that situates the chunk within the overall document.

This context is then attached as metadata to each chunk, improving the quality of retrieval and generation tasks downstream, especially in Graph Retrieval Augmented generartion systems

Results from the research paper show that a single glean of the SlideNodeParser with default values (chunk_size=1200tokens, window_size=11) results in the identification of 37% more entities and relationships than standard Node Parsers

Installation

pip install llama-index-node-parser-slide

Usage

from llama_index.core import Document
from llama_index.node_parser.slide import SlideNodeParser

# — Synchronous usage —
parser = SlideNodeParser.from_defaults(
    llm=llm,
    chunk_size=800,
    window_size=5,
)
nodes = parser.get_nodes_from_documents(
    [
        Document(text="document text 1"),
        Document(text="document text 2"),
    ]
)

# — Asynchronous usage (for parallel LLM calls) —
# Specify llm_workers > 1 to run multiple LLM calls concurrently
parser = SlideNodeParser.from_defaults(
    llm=llm,
    chunk_size=800,
    window_size=5,
    llm_workers=2,
)
nodes = await parser.aget_nodes_from_documents(
    [
        Document(text="document text 1"),
        Document(text="document text 2"),
    ]
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_node_parser_slide-0.2.1.tar.gz (7.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file llama_index_node_parser_slide-0.2.1.tar.gz.

File metadata

File hashes

Hashes for llama_index_node_parser_slide-0.2.1.tar.gz
Algorithm Hash digest
SHA256 f45eb027bea8997a850b05b8f3f467ab9bba403c983ed69ca73d31bace77f4ad
MD5 1298c66930116653b79b38be8e9eb313
BLAKE2b-256 33e127762b59fbc447fe735b4a6eace708e99373eb9489f51504992e8874462e

See more details on using hashes here.

File details

Details for the file llama_index_node_parser_slide-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_node_parser_slide-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3cab8965bf3bbf10dfb2371e6fd0bd8457bf0303ef51220df9de065726a6f8fa
MD5 720e3ecdceb88b0687038eb7c0840134
BLAKE2b-256 e93a58a3b3dace34022e3d7771384a8692168ac26be581a8027d90d7f5b15934

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page