Skip to main content

A package for extracting and querying knowledge using GPT models

Project description

knowledgegpt

knowledgegpt is designed to gather information from various sources, including the internet and local data, which can be used to create prompts. These prompts can then be utilized by OpenAI's GPT-3 model to generate answers that are subsequently stored in a database for future reference.

To accomplish this, the text is first transformed into a fixed-size vector using either open source or OpenAI models. When a query is submitted, the text is also transformed into a vector and compared to the stored knowledge embeddings. The most relevant information is then selected and used to generate a prompt context.

knowledgegpt supports various information sources including websites, PDFs, PowerPoint files (PPTX), and documents (Docs). Additionally, it can extract text from YouTube subtitles and audio (using speech-to-text technology) and use it as a source of information. This allows for a diverse range of information to be gathered and used for generating prompts and answers.

Installation

  1. PyPI installation, run in terminal: pip install knowledgegpt

  2. Or you can use the latest version from the repository: pip install -r requirements.txt and then pip install .

  3. Download needed language model for parsing: python3 -m spacy download en_core_web_sm

Set Your API Key

  1. Go to OpenAI > Account > Api Keys
  2. Create new secret key and copy
  3. Enter the key to example_config.py

How to use the library

# Import the library
from knowledgegpt.extractors.web_scrape_extractor import WebScrapeExtractor

# Import OpenAI and Set the API Key
import openai
from example_config import SECRET_KEY 
openai.api_key = SECRET_KEY

# Define target website
url = "https://en.wikipedia.org/wiki/Bombard_(weapon)"

# Initialize the WebScrapeExtractor
scrape_website = WebScrapeExtractor( url=url, embedding_extractor="hf", model_lang="en")

# Prompt the OpenAI Model
answer, prompt, messages = scrape_website.extract(query="What is a bombard?",max_tokens=300,  to_save=True, mongo_client=db)

# See the answer
print(answer)

# Output: 'A bombard is a type of large cannon used during the 14th to 15th centuries.'

Other examples can be found in the examples folder. But to give a better idea of how to use the library, here is a simple example:

# Basic Usage
basic_extractor = BaseExtractor(df)
answer, prompt, messages = basic_extractor.extract("What is the title of this PDF?", max_tokens=300)
# PDF Extraction
pdf_extractor = PDFExtractor( pdf_file_path, extraction_type="page", embedding_extractor="hf", model_lang="en")
answer, prompt, messages = pdf_extractor.extract(query, max_tokens=1500)
# PPTX Extraction
ppt_extractor = PowerpointExtractor(file_path=ppt_file_path, embedding_extractor="hf", model_lang="en")
answer, prompt, messages = ppt_extractor.extract( query,max_tokens=500)
# DOCX Extraction
docs_extractor = DocsExtractor(file_path="../example.docx", embedding_extractor="hf", model_lang="en", is_turbo=False)
answer, prompt, messages = \
    docs_extractor.extract( query="What is an object detection system?", max_tokens=300)
# Extraction from Youtube video (audio)
scrape_yt_audio = YoutubeAudioExtractor(video_id=url, model_lang='tr', embedding_extractor='hf')
answer, prompt, messages = scrape_yt_audio.extract( query=query, max_tokens=1200)

# Extraction from Youtube video (transcript)
scrape_yt_subs = YTSubsExtractor(video_id=url, embedding_extractor='hf', model_lang='en')
answer, prompt, messages = scrape_yt_subs.extract( query=query, max_tokens=1200)

Docker Usage

docker build -t knowledgegptimage .
docker run -p 8888:8888 knowledgegptimage

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

knowledgegpt-0.0.6b0.tar.gz (15.9 kB view details)

Uploaded Source

Built Distribution

knowledgegpt-0.0.6b0-py3-none-any.whl (21.9 kB view details)

Uploaded Python 3

File details

Details for the file knowledgegpt-0.0.6b0.tar.gz.

File metadata

  • Download URL: knowledgegpt-0.0.6b0.tar.gz
  • Upload date:
  • Size: 15.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for knowledgegpt-0.0.6b0.tar.gz
Algorithm Hash digest
SHA256 e4100e09915d1c5429d9e4a2a3cb370c7baff882adb175e6d77fc8451226a6c8
MD5 d85c1103bbc477145306bd1b4e5cd0fe
BLAKE2b-256 cccd57618c7021fadd5632bd64489d3ea44315430b083b9a57ed1e4aae64cf7b

See more details on using hashes here.

File details

Details for the file knowledgegpt-0.0.6b0-py3-none-any.whl.

File metadata

File hashes

Hashes for knowledgegpt-0.0.6b0-py3-none-any.whl
Algorithm Hash digest
SHA256 4d5c9744ca153320cb315d1052ce2e6d329553306bf0752c8f63b1519c1bb771
MD5 8cbce7b79a07118759d8a5ab05282a2b
BLAKE2b-256 650efe01cbca33f92a87a22b2c51c3e9460ecde1c9de607a1a8dd256992446fd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page