Skip to main content

Python client for colbertdb

Project description

Quickstart Guide for pycolbertdb

This quickstart guide provides instructions on how to use the pycolbertdb package to integrate ColbertDB with LlamaIndex, leveraging OpenAI's GPT-4 model for processing and querying documents.

Prerequisites

Ensure you have the following installed and configured:

  • Python 3.x
  • An OpenAI API key
  • Environment variables configured for ColbertDB

Installation

  1. Install the necessary packages

    pip install pycolbertdb -U
    pip install llama-index
    pip install llama-index-readers-web
    pip install requests
    pip install python-dotenv
    

Code Example

Below is an example of how to use the pycolbertdb package to fetch, process, and query documents.

Import Dependencies

Start by importing the necessary dependencies.

import os
from dotenv import load_dotenv
from llama_index.readers.web import SimpleWebPageReader
from llama_index.core import Document, PromptTemplate
from llama_index.llms.openai import OpenAI

from pycolbertdb.client import Colbertdb
from pycolbertdb.models import CreateCollectionDocument
from pycolbertdb.helpers import from_llama_index_documents

Load Environment Variables

Load your environment variables from a .env file.

load_dotenv()
URL = os.getenv('COLBERTDB_URL')
API_KEY = os.getenv('COLBERTDB_API_KEY')
STORE_NAME = os.getenv('COLBERTDB_STORE_NAME')
OPEN_AI_KEY = os.getenv('OPENAI_API_KEY')

URLS = ['https://en.wikipedia.org/wiki/Onigiri']

Initialize Clients

Initialize the ColbertDB and OpenAI clients.

client = Colbertdb(url=URL, api_key=API_KEY, store_name=STORE_NAME)
open_ai_client = OpenAI(model="gpt-4-turbo", api_key=OPEN_AI_KEY)

qa_prompt_tmpl_str = """\
Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query.
Please write the answer in the style of {tone_name}
Query: {query_str}
Answer: \
"""

prompt_tmpl = PromptTemplate(qa_prompt_tmpl_str)

Fetch and Process Documents

Fetch and process HTML content from the specified URLs.

docs = from_llama_index_documents(SimpleWebPageReader(html_to_text=True).load_data(URLS))

Create a Collection in ColbertDB

Create a new collection in ColbertDB with the processed documents.

collection = client.create_collection(documents=docs, name='rice_ball_facts', options={"force_create": True})

Search the Collection

Perform a search query on the created collection.

result = collection.search(query="What are some popular fillings for onigiri?", k=3)

Generate a Response Using OpenAI

Format the retrieved documents and generate a response using OpenAI.

context = ''
for document in result.documents:
    print("Source: " + document.metadata['source'] + "\n", document.content)
    context += (document.content + "\n\n")

prompt = prompt_tmpl.format(context_str=context, tone_name="shakespeare", query_str="What are some typical onigiri fillings")
response = open_ai_client.complete(prompt)
print(response)

Add New Documents to the Collection

Fetch additional documents and add them to the existing collection.

new_docs = SimpleWebPageReader(html_to_text=True).load_data(["https://en.wikipedia.org/wiki/Kewpie_(mayonnaise)"])
new_formatted = [{"content": doc.text, "metadata": {"source": doc.id_}} for doc in new_docs[0:2]]

collection = collection.add_documents(documents=new_formatted)

Search the Updated Collection

Perform a new search query on the updated collection.

new_result = collection.search(query="When was kewpie mayo founded?", k=3)
new_context = ''
for document in new_result.documents:
    print("Source: " + document.metadata['source'] + "\n", document.content)
    new_context += (document.content + "\n\n")

prompt = prompt_tmpl.format(context_str=new_context, tone_name="bruce springsteen", query_str="When and where was kewpie mayo founded")
new_response = open_ai_client.complete(prompt)
print(new_response)

Conclusion

This guide provides a quickstart overview of using the pycolbertdb package for document processing and querying. Customize the prompt and collection as needed for your specific use case.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycolbertdb-0.2.8.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pycolbertdb-0.2.8-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file pycolbertdb-0.2.8.tar.gz.

File metadata

  • Download URL: pycolbertdb-0.2.8.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.1 Linux/6.5.0-1021-azure

File hashes

Hashes for pycolbertdb-0.2.8.tar.gz
Algorithm Hash digest
SHA256 70788efef0b5338ec1bb820509349a3ff8fa2691889a163c7398c299d82146d0
MD5 90d8ab811b111fd8d926b3829456f155
BLAKE2b-256 fac2c55931801e4a285c44665f36390ec8d8887dae90d018a5e55369ff4d5087

See more details on using hashes here.

File details

Details for the file pycolbertdb-0.2.8-py3-none-any.whl.

File metadata

  • Download URL: pycolbertdb-0.2.8-py3-none-any.whl
  • Upload date:
  • Size: 7.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.1 Linux/6.5.0-1021-azure

File hashes

Hashes for pycolbertdb-0.2.8-py3-none-any.whl
Algorithm Hash digest
SHA256 3a86d232d629b8b39a0004b3a3799cc54f889eedff38d1eea7175ef94bfe58a8
MD5 49fc5dd6187d6f7c60c3556ac7ed3225
BLAKE2b-256 fa50400b4bf6a7d96fa3c6a0fbe63de751bba00a8065ed1171873048a3bbb805

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page