Skip to main content

A library of community-driven data loaders for LLMs. Use with LlamaIndex and/or LangChain.

Project description

LlamaHub 🦙

This is a simple library of all the data loaders / readers / tools that have been created by the community. The goal is to make it extremely easy to connect large language models to a large variety of knowledge sources. These are general-purpose utilities that are meant to be used in LlamaIndex (e.g. when building a index) and LangChain (e.g. when building different tools an agent can use). For example, there are loaders to parse Google Docs, SQL Databases, PDF files, PowerPoints, Notion, Slack, Obsidian, and many more. Note that because different loaders produce the same types of Documents, you can easily use them together in the same index.

Check out our website here: https://llamahub.ai/.

Website screenshot

Usage (Use llama-hub as PyPI package)

These general-purpose loaders are designed to be used as a way to load data into LlamaIndex and/or subsequently used in LangChain.

Installation

pip install llama-hub

LlamaIndex

from llama_index import GPTVectorStoreIndex
from llama_hub.google_docs.base import GoogleDocsReader

gdoc_ids = ['1wf-y2pd9C878Oh-FmLH7Q_BQkljdm6TQal-c1pUfrec']
loader = GoogleDocsReader()
documents = loader.load_data(document_ids=gdoc_ids)
index = GPTVectorStoreIndex.from_documents(documents)
index.query('Where did the author go to school?')

LangChain

Note: Make sure you change the description of the Tool to match your use-case.

from llama_index import GPTVectorStoreIndex
from llama_hub.google_docs.base import GoogleDocsReader
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain

# load documents
gdoc_ids = ['1wf-y2pd9C878Oh-FmLH7Q_BQkljdm6TQal-c1pUfrec']
loader = GoogleDocsReader()
documents = loader.load_data(document_ids=gdoc_ids)
langchain_documents = [d.to_langchain_format() for d in documents]

# initialize sample QA chain
llm = OpenAI(temperature=0)
qa_chain = load_qa_chain(llm)
question="<query here>"
answer = qa_chain.run(input_documents=langchain_documents, question=question)

Loaders vs Tools

This repo contains two main types of plugins for large language models, loaders and tools. Loaders are contained in the llama_hub folder here, while tools are in the tools subfolder here: here.

Loaders are intended to be used for a human to load data into the large language model, while tools are data services that are meant for a LLM agent to interact with to load or modify data.

For examples on how to use Tools, reference the notebooks here

Loader Usage (Use download_loader from LlamaIndex)

You can also use the loaders with download_loader from LlamaIndex in a single line of code.

For example, see the code snippets below using the Google Docs Loader.

from llama_index import GPTVectorStoreIndex, download_loader

GoogleDocsReader = download_loader('GoogleDocsReader')

gdoc_ids = ['1wf-y2pd9C878Oh-FmLH7Q_BQkljdm6TQal-c1pUfrec']
loader = GoogleDocsReader()
documents = loader.load_data(document_ids=gdoc_ids)
index = GPTVectorStoreIndex.from_documents(documents)
index.query('Where did the author go to school?')

How to add a loader

Adding a loader simply requires forking this repo and making a Pull Request. The Loader Hub website will update automatically. However, please keep in the mind the following guidelines when making your PR.

Step 0: Setup virtual environment, install Poetry and dependencies

Create a new Python virtual environment. The command below creates an environment in .venv, and activates it:

python -m venv .venv
source .venv/bin/activate

if you are in windows, use the following to activate your virtual environment:

.venv\scripts\activate

Install poetry:

pip install poetry

Install the required dependencies (this will also install llama_index):

poetry install

This will create an editable install of llama-hub in your venv.

Step 1: Create a new directory

In llama_hub, create a new directory for your new loader. It can be nested within another, but name it something unique because the name of the directory will become the identifier for your loader (e.g. google_docs). Inside your new directory, create a __init__.py file, which can be empty, a base.py file which will contain your loader implementation, and, if needed, a requirements.txt file to list the package dependencies of your loader. Those packages will automatically be installed when your loader is used, so no need to worry about that anymore!

If you'd like, you can create the new directory and files by running the following script in the llama_hub directory. Just remember to put your dependencies into a requirements.txt file.

./add_loader.sh [NAME_OF_NEW_DIRECTORY]

Step 2: Write your README

Inside your new directory, create a README.md that mirrors that of the existing ones. It should have a summary of what your loader does, its inputs, and how its used in the context of LlamaIndex and LangChain.

Step 3: Add your loader to the library.json file

Finally, add your loader to the llama_hub/library.json file so that it may be used by others. As is exemplified by the current file, add in the class name of your loader, along with its id, author, etc. This file is referenced by the Loader Hub website and the download function within LlamaIndex.

Step 4: Make a Pull Request!

Create a PR against the main branch. We typically review the PR within a day. To help expedite the process, it may be helpful to provide screenshots (either in the PR or in the README directly) showing your data loader in action!

Running tests

python3.9 -m venv .venv
source .venv/bin/activate 
pip3 install -r test_requirements.txt

poetry run pytest tests 

Changelog

If you want to track the latest version updates / see which loaders are added to each release, take a look at our full changelog here!

FAQ

How do I test my loader before it's merged?

There is an argument called loader_hub_url in download_loader that defaults to the main branch of this repo. You can set it to your branch or fork to test your new loader.

Should I create a PR against LlamaHub or the LlamaIndex repo directly?

If you have a data loader PR, by default let's try to create it against LlamaHub! We will make exceptions in certain cases (for instance, if we think the data loader should be core to the LlamaIndex repo).

For all other PR's relevant to LlamaIndex, let's create it directly against the LlamaIndex repo.

Other questions?

Feel free to hop into the community Discord or tag the official Twitter account!

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_hub-0.0.10.tar.gz (255.3 kB view details)

Uploaded Source

Built Distribution

llama_hub-0.0.10-py3-none-any.whl (417.8 kB view details)

Uploaded Python 3

File details

Details for the file llama_hub-0.0.10.tar.gz.

File metadata

  • Download URL: llama_hub-0.0.10.tar.gz
  • Upload date:
  • Size: 255.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.1 CPython/3.10.10 Darwin/22.3.0

File hashes

Hashes for llama_hub-0.0.10.tar.gz
Algorithm Hash digest
SHA256 80ebda6ba6113544439fa06892853f5c7289a9e90341401b3d70695888a85794
MD5 c08ef131233d6a14960a4647f1e5dda6
BLAKE2b-256 0f1f1da5121dab3fa9a99391b4a261d9767ee3d6d09587c9bf293c7612b358b4

See more details on using hashes here.

File details

Details for the file llama_hub-0.0.10-py3-none-any.whl.

File metadata

  • Download URL: llama_hub-0.0.10-py3-none-any.whl
  • Upload date:
  • Size: 417.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.1 CPython/3.10.10 Darwin/22.3.0

File hashes

Hashes for llama_hub-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 48280d8aa1fc7c61a26d6f7fe2e7b45034810216d039f1cf8c589841dc81ef8f
MD5 7175bdd4f4b944e9f366553c145fee60
BLAKE2b-256 3ff999f615c2c26cd6e4903531c80e05bf0eb880e023e2b87fde80d86cc605db

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page