Skip to main content

llama-index node_parser topic node parser integration

Project description

LlamaIndex Node_Parser Integration: TopicNodeParser

Implements the topic node parser described in the paper MedGraphRAG, which aims to improve the capabilities of LLMs in the medical domain by generating evidence-based results through a novel graph-based Retrieval-Augmented Generation framework, improving safety and reliability in handling private medical data.

TopicNodeParser implements an approximate version of the chunking technique described in the paper.

Here is the technique as outlined in the paper:

Large medical documents often contain multiple themes or diverse content. To process these effectively, we first segment the document into data chunks that conform to the context limitations of Large Language Models (LLMs). Traditional methods such as chunking based on token size or fixed characters typically fail to detect subtle shifts in topics accurately. Consequently, these chunks may not fully capture the intended context, leading to a loss in the richness of meaning.

To enhance accuracy, we adopt a mixed method of character separation coupled with topic-based segmentation. Specifically, we utilize static characters (line break symbols) to isolate individual paragraphs within the document. Following this, we apply a derived form of the text for semantic chunking. Our approach includes the use of proposition transfer, which extracts standalone statements from a raw text Chen et al. (2023). Through proposition transfer, each paragraph is transformed into self-sustaining statements. We then conduct a sequential analysis of the document to assess each proposition, deciding whether it should merge with an existing chunk or initiate a new one. This decision is made via a zero-shot approach by an LLM. To reduce noise generated by sequential processing, we implement a sliding window technique, managing five paragraphs at a time. We continuously adjust the window by removing the first paragraph and adding the next, maintaining focus on topic consistency. We set a hard threshold that the longest chunk cannot excess the context length limitation of LLM. After chunking the document, we construct graph on each individual of the data chunk.

Installation

pip install llama-index-node-parser-topic

Usage

from llama_index.core import Document
from llama_index.node_parser.topic import TopicNodeParser

node_parser = TopicNodeParser.from_defaults(
    llm=llm,
    max_chunk_size=1000,
    similarity_method="llm",  # can be "llm" or "embedding"
    # embed_model=embed_model,  # used for "embedding" similarity_method
    # similarity_threshold=0.8,  # used for "embedding" similarity_method
    window_size=2,  # paper suggests window_size=5
)

nodes = node_parser(
    [
        Document(text="document text 1"),
        Document(text="document text 2"),
    ],
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_node_parser_topic-0.4.0.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llama_index_node_parser_topic-0.4.0-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file llama_index_node_parser_topic-0.4.0.tar.gz.

File metadata

  • Download URL: llama_index_node_parser_topic-0.4.0.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_node_parser_topic-0.4.0.tar.gz
Algorithm Hash digest
SHA256 7ce378773492fa14badad8a7a14dfb75919cb5a337e1d7c8d78430ae059190d6
MD5 44d080b3999c83b623519bb0c38d50c6
BLAKE2b-256 aab47fe338c646e5000db2ef4325589ec86f6b45f5465c7b202a2ae6e6872a3a

See more details on using hashes here.

File details

Details for the file llama_index_node_parser_topic-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: llama_index_node_parser_topic-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_node_parser_topic-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c243a9f215867ba3451eac3ccaf7bd0c1fd762d329b2907b2d33bb0748ddb76c
MD5 da977f0cd0a03d18b51c69bdfdcf7a9a
BLAKE2b-256 2f32958fd0cdd3286d117cd69b91518e322ec9eab1b93cab8725ad6b11bb711a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page