Python client for colbertdb
Project description
Quickstart Guide for pycolbertdb
This quickstart guide provides instructions on how to use the pycolbertdb package to integrate ColbertDB with LlamaIndex, leveraging OpenAI's GPT-4 model for processing and querying documents.
Prerequisites
Ensure you have the following installed and configured:
- Python 3.x
- An OpenAI API key
- Environment variables configured for ColbertDB
Installation
-
Install the necessary packages
pip install pycolbertdb -U pip install llama-index pip install llama-index-readers-web pip install requests pip install python-dotenv
Code Example
Below is an example of how to use the pycolbertdb package to fetch, process, and query documents.
Import Dependencies
Start by importing the necessary dependencies.
import os
from dotenv import load_dotenv
from llama_index.readers.web import SimpleWebPageReader
from llama_index.core import Document, PromptTemplate
from llama_index.llms.openai import OpenAI
from pycolbertdb.client import Colbertdb
from pycolbertdb.models import CreateCollectionDocument
from pycolbertdb.helpers import from_llama_index_documents
Load Environment Variables
Load your environment variables from a .env file.
load_dotenv()
URL = os.getenv('COLBERTDB_URL')
API_KEY = os.getenv('COLBERTDB_API_KEY')
STORE_NAME = os.getenv('COLBERTDB_STORE_NAME')
OPEN_AI_KEY = os.getenv('OPENAI_API_KEY')
URLS = ['https://en.wikipedia.org/wiki/Onigiri']
Initialize Clients
Initialize the ColbertDB and OpenAI clients.
client = Colbertdb(url=URL, api_key=API_KEY, store_name=STORE_NAME)
open_ai_client = OpenAI(model="gpt-4-turbo", api_key=OPEN_AI_KEY)
qa_prompt_tmpl_str = """\
Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query.
Please write the answer in the style of {tone_name}
Query: {query_str}
Answer: \
"""
prompt_tmpl = PromptTemplate(qa_prompt_tmpl_str)
Fetch and Process Documents
Fetch and process HTML content from the specified URLs.
docs = from_llama_index_documents(SimpleWebPageReader(html_to_text=True).load_data(URLS))
Create a Collection in ColbertDB
Create a new collection in ColbertDB with the processed documents.
collection = client.create_collection(documents=docs, name='rice_ball_facts', options={"force_create": True})
Search the Collection
Perform a search query on the created collection.
result = collection.search(query="What are some popular fillings for onigiri?", k=3)
Generate a Response Using OpenAI
Format the retrieved documents and generate a response using OpenAI.
context = ''
for document in result.documents:
print("Source: " + document.metadata['source'] + "\n", document.content)
context += (document.content + "\n\n")
prompt = prompt_tmpl.format(context_str=context, tone_name="shakespeare", query_str="What are some typical onigiri fillings")
response = open_ai_client.complete(prompt)
print(response)
Add New Documents to the Collection
Fetch additional documents and add them to the existing collection.
new_docs = SimpleWebPageReader(html_to_text=True).load_data(["https://en.wikipedia.org/wiki/Kewpie_(mayonnaise)"])
new_formatted = [{"content": doc.text, "metadata": {"source": doc.id_}} for doc in new_docs[0:2]]
collection = collection.add_documents(documents=new_formatted)
Search the Updated Collection
Perform a new search query on the updated collection.
new_result = collection.search(query="When was kewpie mayo founded?", k=3)
new_context = ''
for document in new_result.documents:
print("Source: " + document.metadata['source'] + "\n", document.content)
new_context += (document.content + "\n\n")
prompt = prompt_tmpl.format(context_str=new_context, tone_name="bruce springsteen", query_str="When and where was kewpie mayo founded")
new_response = open_ai_client.complete(prompt)
print(new_response)
Conclusion
This guide provides a quickstart overview of using the pycolbertdb package for document processing and querying. Customize the prompt and collection as needed for your specific use case.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pycolbertdb-0.2.8.tar.gz.
File metadata
- Download URL: pycolbertdb-0.2.8.tar.gz
- Upload date:
- Size: 6.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.1 Linux/6.5.0-1021-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70788efef0b5338ec1bb820509349a3ff8fa2691889a163c7398c299d82146d0
|
|
| MD5 |
90d8ab811b111fd8d926b3829456f155
|
|
| BLAKE2b-256 |
fac2c55931801e4a285c44665f36390ec8d8887dae90d018a5e55369ff4d5087
|
File details
Details for the file pycolbertdb-0.2.8-py3-none-any.whl.
File metadata
- Download URL: pycolbertdb-0.2.8-py3-none-any.whl
- Upload date:
- Size: 7.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.1 Linux/6.5.0-1021-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3a86d232d629b8b39a0004b3a3799cc54f889eedff38d1eea7175ef94bfe58a8
|
|
| MD5 |
49fc5dd6187d6f7c60c3556ac7ed3225
|
|
| BLAKE2b-256 |
fa50400b4bf6a7d96fa3c6a0fbe63de751bba00a8065ed1171873048a3bbb805
|