Skip to main content

llama-index readers lilac integration

Project description

Lilac reader

pip install llama-index-readers-papers

pip install llama-index-readers-lilac

Lilac is an open-source product that helps you analyze, enrich, and clean unstructured data with AI.

It can be used to analyze, clean, structure, and label data that can be used in downstream LlamaIndex and LangChain applications.

Lilac projects

This assumes you've already run Lilac locally, and have a project directory with a dataset. For more details on Lilac projects, see Lilac Projects

You can use any LlamaIndex loader to load data into Lilac, clean data, and then bring it back into LlamaIndex Documents.

Usage

LlamaIndex => Lilac

See this notebook for getting data into Lilac from LlamaHub.

import lilac as ll

# See: https://llamahub.ai/l/papers-arxiv
from llama_index.readers.papers import ArxivReader

loader = ArxivReader()
documents = loader.load_data(search_query="au:Karpathy")

# Set the project directory for Lilac.
ll.set_project_dir("./data")

# This assumes you already have a lilac project set up.
# If you don't, use ll.init(project_dir='./data')
ll.create_dataset(
    config=ll.DatasetConfig(
        namespace="local",
        name="arxiv-karpathy",
        source=ll.LlamaIndexDocsSource(
            # documents comes from the loader.load_data call in the previous cell.
            documents=documents
        ),
    )
)

# You can start a lilac server with. Once you've cleaned the dataset, you can come back into GPTIndex.
ll.start_server(project_dir="./data")

Lilac => LlamaIndex Documents

from llama_index.core import VectorStoreIndex, download_loader

from llama_index.readers.lilac import LilacReader

loader = LilacReader()
documents = loader.load_data(
    project_dir="~/my_project",
    # The name of your dataset in the project dir.
    dataset="local/arxiv-karpathy",
)

index = VectorStoreIndex.from_documents(documents)

index.query("How are ImageNet labels validated?")

This loader is designed to be used as a way to load data into GPT Index and/or subsequently used in a LangChain Agent.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_lilac-0.4.1.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llama_index_readers_lilac-0.4.1-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file llama_index_readers_lilac-0.4.1.tar.gz.

File metadata

File hashes

Hashes for llama_index_readers_lilac-0.4.1.tar.gz
Algorithm Hash digest
SHA256 3ae69ff6ad93e547cda48f220723468d08d3d85d262b90b9809a2d77615c4e5c
MD5 037f4a3b7b7babfae34f69655748909b
BLAKE2b-256 5b6037ffd56d062f2fb0eb1ed338e7a06bc3ea52d8ff1a34fc192d96df8f5326

See more details on using hashes here.

File details

Details for the file llama_index_readers_lilac-0.4.1-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_readers_lilac-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 385c973410a33274eb1031146852aeddee6696681303f008911b2b09a62fb7f6
MD5 43361e94e71346b0d22d52328b2b5041
BLAKE2b-256 915a4afe11a16b16c7139c3c3818a7f03ed2803c7ef11de3706e6c35d6136716

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page