Skip to main content

llama-index packs subdoc-summary implementation

Project description

LlamaIndex Packs Integration: Subdoc-Summary

This LlamaPack provides an advanced technique for injecting each chunk with "sub-document" metadata. This context augmentation technique is helpful for both retrieving relevant context and for synthesizing correct answers.

It is a step beyond simply adding a summary of the document as the metadata to each chunk. Within a long document, there can be multiple distinct themes, and we want each chunk to be grounded in global but relevant context.

This technique was inspired by our "Practical Tips and Tricks" video: https://www.youtube.com/watch?v=ZP1F9z-S7T0.

Installation

pip install llama-index llama-index-packs-subdoc-summary

CLI Usage

You can download llamapacks directly using llamaindex-cli, which comes installed with the llama-index python package:

llamaindex-cli download-llamapack SubDocSummaryPack --download-dir ./subdoc_summary_pack

You can then inspect the files at ./subdoc_summary_pack and use them as a template for your own project.

Code Usage

You can download the pack to a the ./subdoc_summary_pack directory:

from llama_index.core.llama_pack import download_llama_pack

# download and install dependencies
SubDocSummaryPack = download_llama_pack(
    "SubDocSummaryPack", "./subdoc_summary_pack"
)

# You can use any llama-hub loader to get documents!
subdoc_summary_pack = SubDocSummaryPack(
    documents,
    parent_chunk_size=8192,  # default,
    child_chunk_size=512,  # default
    llm=OpenAI(model="gpt-3.5-turbo"),
    embed_model=OpenAIEmbedding(),
)

Initializing the pack will split documents into parent chunks and child chunks. It will inject parent chunk summaries into child chunks, and index the child chunks.

Running the pack will run the query engine over the vectorized child chunks.

response = subdoc_summary_pack.run("<query>", similarity_top_k=2)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file llama_index_packs_subdoc_summary-0.3.0.tar.gz.

File metadata

File hashes

Hashes for llama_index_packs_subdoc_summary-0.3.0.tar.gz
Algorithm Hash digest
SHA256 a8c12196f25856737eaffde80ad6202eacab123c26198394b3c83756b09febe5
MD5 e668e5602e6b02704aecd870e009c11d
BLAKE2b-256 eb96e294cb0f033ad1821304dfcd46b5a114e093cd938030c38d6cd21dbc7788

See more details on using hashes here.

File details

Details for the file llama_index_packs_subdoc_summary-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_packs_subdoc_summary-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 436ecad916fd8f522e5a3278022022a8872035b7f9800b46ffea769e49ae9312
MD5 8fbc53fa39aa697ffece58d37dac0cda
BLAKE2b-256 288ed1603d398379e386e53d1e2439eb882f47b36615df275b1e2bbc95328913

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page