Skip to main content

llama-index readers lilac integration

Project description

Lilac reader

pip install llama-index-readers-papers

pip install llama-index-readers-lilac

Lilac is an open-source product that helps you analyze, enrich, and clean unstructured data with AI.

It can be used to analyze, clean, structure, and label data that can be used in downstream LlamaIndex and LangChain applications.

Lilac projects

This assumes you've already run Lilac locally, and have a project directory with a dataset. For more details on Lilac projects, see Lilac Projects

You can use any LlamaIndex loader to load data into Lilac, clean data, and then bring it back into LlamaIndex Documents.

Usage

LlamaIndex => Lilac

See this notebook for getting data into Lilac from LlamaHub.

import lilac as ll

# See: https://llamahub.ai/l/papers-arxiv
from llama_index.readers.papers import ArxivReader

loader = ArxivReader()
documents = loader.load_data(search_query="au:Karpathy")

# Set the project directory for Lilac.
ll.set_project_dir("./data")

# This assumes you already have a lilac project set up.
# If you don't, use ll.init(project_dir='./data')
ll.create_dataset(
    config=ll.DatasetConfig(
        namespace="local",
        name="arxiv-karpathy",
        source=ll.LlamaIndexDocsSource(
            # documents comes from the loader.load_data call in the previous cell.
            documents=documents
        ),
    )
)

# You can start a lilac server with. Once you've cleaned the dataset, you can come back into GPTIndex.
ll.start_server(project_dir="./data")

Lilac => LlamaIndex Documents

from llama_index.core import VectorStoreIndex, download_loader

from llama_index.readers.lilac import LilacReader

loader = LilacReader()
documents = loader.load_data(
    project_dir="~/my_project",
    # The name of your dataset in the project dir.
    dataset="local/arxiv-karpathy",
)

index = VectorStoreIndex.from_documents(documents)

index.query("How are ImageNet labels validated?")

This loader is designed to be used as a way to load data into GPT Index and/or subsequently used in a LangChain Agent.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_lilac-0.4.0.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llama_index_readers_lilac-0.4.0-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file llama_index_readers_lilac-0.4.0.tar.gz.

File metadata

File hashes

Hashes for llama_index_readers_lilac-0.4.0.tar.gz
Algorithm Hash digest
SHA256 19c26a7ff7df9247a2f2e107281db06a5e1de6f67dc133e1d9bf86da0ac8acc0
MD5 9388fce3380c4a4e99f96560b353341d
BLAKE2b-256 681565a992958324bb64774f2b8164272d4204610da142c9f3d30f07a15732b2

See more details on using hashes here.

File details

Details for the file llama_index_readers_lilac-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_readers_lilac-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3c8cb145eb3362756e027127e1deb21a048b7283c10293e954757ca777eb892e
MD5 4cc8a9f3e59df33da1c3c8da0d860d51
BLAKE2b-256 d3699307bcd183463e9a3d26a326ebf05e365f81eefd6c24691d49f4839d128b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page