Skip to main content

llama-index packs multidoc_autoretrieval integration

Project description

Multi-Document AutoRetrieval (with Weaviate) Pack

This LlamaPack implements structured hierarchical retrieval over multiple documents, using multiple @weaviate_io collections.

CLI Usage

You can download llamapacks directly using llamaindex-cli, which comes installed with the llama-index python package:

llamaindex-cli download-llamapack MultiDocAutoRetrieverPack --download-dir ./multidoc_autoretrieval_pack

You can then inspect the files at ./multidoc_autoretrieval_pack and use them as a template for your own project!

Code Usage

You can download the pack to a the ./multidoc_autoretrieval_pack directory:

from llama_index.core.llama_pack import download_llama_pack

# download and install dependencies
MultiDocAutoRetrieverPack = download_llama_pack(
    "MultiDocAutoRetrieverPack", "./multidoc_autoretrieval_pack"
)

From here, you can use the pack. To initialize it, you need to define a few arguments, see below.

Then, you can set up the pack like so:

# setup pack arguments
from llama_index.core.vector_stores import MetadataInfo, VectorStoreInfo

import weaviate

# cloud
auth_config = weaviate.AuthApiKey(api_key="<api_key>")
client = weaviate.Client(
    "https://<cluster>.weaviate.network",
    auth_client_secret=auth_config,
)

vector_store_info = VectorStoreInfo(
    content_info="Github Issues",
    metadata_info=[
        MetadataInfo(
            name="state",
            description="Whether the issue is `open` or `closed`",
            type="string",
        ),
        ...,
    ],
)

# metadata_nodes is set of nodes with metadata representing each document
# docs is the source docs
# metadata_nodes and docs must be the same length
metadata_nodes = [TextNode(..., metadata={...}), ...]
docs = [Document(...), ...]

pack = MultiDocAutoRetrieverPack(
    client,
    "<metadata_index_name>",
    "<doc_chunks_index_name>",
    metadata_nodes,
    docs,
    vector_store_info,
    auto_retriever_kwargs={
        # any kwargs for the auto-retriever
        ...
    },
)

The run() function is a light wrapper around query_engine.query().

response = pack.run("Tell me a bout a Music celebritiy.")

You can also use modules individually.

# use the retriever
retriever = pack.retriever
nodes = retriever.retrieve("query_str")

# use the query engine
query_engine = pack.query_engine
response = query_engine.query("query_str")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file llama_index_packs_multidoc_autoretrieval-0.3.1.tar.gz.

File metadata

File hashes

Hashes for llama_index_packs_multidoc_autoretrieval-0.3.1.tar.gz
Algorithm Hash digest
SHA256 d70a9300cb6daab282cf319ed8962cf49dcc311086ed089db6ad2a486364a053
MD5 bf16d308e3cc408b20bd20ad7f3325ab
BLAKE2b-256 6a17f6ee94dd3c9e39ad60a883ddfbd3b7e7c6d28b5172768d1b1b2208d495bb

See more details on using hashes here.

File details

Details for the file llama_index_packs_multidoc_autoretrieval-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_packs_multidoc_autoretrieval-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1a1b4c6f908b08b7da82b619ba352a5e47638b640942520e5a4eb920e712c710
MD5 dd0ff87e91f71a93957720b19b598da3
BLAKE2b-256 2da59bb99c8e47aadc7209e85f6ed215d8de274e40f7338f2c36ca947ca1fca3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page