Skip to main content

A package for extracting and querying knowledge using GPT models

Project description

knowledgegpt

Pip Lib

https://pypi.org/project/knowledgegpt/

  • To use library
  • pip install knowledgegpt

Before run project locally

  • Please check config file use own open-ai api-key and your own mongo uri

To run mongo locally

  • docker pull mongo:latest
  • sh sh/docker_mongo_local_run.sh
  • docker ps

knowledgegpt

knowledgegpt is designed to gather information from various sources, including the internet and local data, which can be used to create prompts. These prompts can then be utilized by OpenAI's GPT-3 model to generate answers that are subsequently stored in a database for future reference.

To accomplish this, the text is first transformed into a fixed-size vector using either open source or OpenAI models. When a query is submitted, the text is also transformed into a vector and compared to the stored knowledge embeddings. The most relevant information is then selected and used to generate a prompt context.

knowledgegpt supports various information sources including websites, PDFs, PowerPoint files (PPTX), and documents (Docs). Additionally, it can extract text from YouTube subtitles and audio (using speech-to-text technology) and use it as a source of information. This allows for a diverse range of information to be gathered and used for generating prompts and answers.

How to use

Restful API

uvicorn server:app --reload

How to install the library

pip install knowledgegpt or

git clone https://github.com/geeks-of-data/knowledge-gpt.git
pip install .

Before running for the first time download the related spacy model by running:

# !python3 -m spacy download en_core_web_sm

How to use the library

# Import the library
from knowledgegpt.extractors.web_scrape_extractor import WebScrapeExtractor

# Import OpenAI and Set the API Key
import openai
from example_config import SECRET_KEY 
openai.api_key = SECRET_KEY


# If you want to use mongodb to store the data
from config import MONGO_URI
from pymongo import MongoClient

client  = MongoClient(MONGO_URI)
db = client.openai_test

# Define target website
url = "https://en.wikipedia.org/wiki/Bombard_(weapon)"

# Initialize the WebScrapeExtractor
scrape_website = WebScrapeExtractor( url=url, embedding_extractor="hf", model_lang="en")

# Prompt the OpenAI Model
answer, prompt, messages = scrape_website.extract(query="What is a bombard?",max_tokens=300,  to_save=True, mongo_client=db)

# See the answer
print(answer)

# Output: 'A bombard is a type of large cannon used during the 14th to 15th centuries.'

Other examples can be found in the examples folder. But to give a better idea of how to use the library, here is a simple example:

# Basic Usage
basic_extractor = BasicExtractor(df)
answer, prompt, messages = basic_extractor.extract("What is the title of this PDF?", max_tokens=300)
# PDF Extraction
pdf_extractor = PDFExtractor( pdf_file_path, extraction_type="page", embedding_extractor="hf", model_lang="en", )
answer, prompt, messages = pdf_extractor.extract(query, max_tokens=1500, to_save=True, mongo_client=db)
# PPTX Extraction
ppt_extractor = PowerpointExtractor(file_path=ppt_file_path, embedding_extractor="hf", model_lang="en",)
answer, prompt, messages = ppt_extractor.extract( query,max_tokens=500, to_save=True, mongo_client=db)
# DOCX Extraction
docs_extractor = DocsExtractor(file_path="../example.docx", embedding_extractor="hf", model_lang="en", is_turbo=False)
answer, prompt, messages = \
    docs_extractor.extract( query="What is an object detection system?", max_tokens=300, to_save=True, mongo_client=db)
# Extraction from Youtube video (audio)
scrape_yt_audio = YoutubeAudioExtractor(video_id=url, model_lang='tr', embedding_extractor='hf')
answer, prompt, messages = scrape_yt_audio.extract( query=query, max_tokens=1200, to_save=True, mongo_client=db)

# Extraction from Youtube video (transcript)
scrape_yt_subs = YTSubsExtractor(video_id=url, embedding_extractor='hf', model_lang='en')
answer, prompt, messages = scrape_yt_subs.extract( query=query, max_tokens=1200, to_save=True, mongo_client=db)

How to contribute

  1. Open an issue
  2. Fork the repo
  3. Create a new branch
  4. Make your changes
  5. Create a pull request

FEATURES

  • Extract knowledge from the internet (i.e. Wikipedia)
  • Extract knowledge from local data sources - PDF
  • Extract knowledge from local data sources - DOCX
  • Extract knowledge from local data sources - PPTX
  • Extract knowledge from youtube audio (when caption is not available)
  • Extract knowledge from youtube transcripts
  • Library implementation (partially done, initial release)

TODO

  • Add a database (partially done)
  • Add a vector database
  • Add Whisper Model
  • Add Whisper for audio longer than 25MB
  • Add a web interface
  • Migrate to Promptify
  • Add ChatGPT support (only in docs endpoint and experimental)
  • Add ChatGPT support with a better infrastructure and planning
  • Increase the number of prompts
  • Increase the number of supported knowledge sources
  • Increase the number of supported languages
  • Increase the number of open source models
  • Dockerize the project
  • Advanced web scraping
  • Prompt-Answer storage
  • Add a better documentation
  • Check library functions to see if they are working properly
  • Add a better logging system
  • Add a better error handling system
  • Add a better testing system

( To be extended...)

System Architecture

System Architecture

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

knowledgegpt-0.0.3b0.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

knowledgegpt-0.0.3b0-py3-none-any.whl (22.7 kB view details)

Uploaded Python 3

File details

Details for the file knowledgegpt-0.0.3b0.tar.gz.

File metadata

  • Download URL: knowledgegpt-0.0.3b0.tar.gz
  • Upload date:
  • Size: 15.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for knowledgegpt-0.0.3b0.tar.gz
Algorithm Hash digest
SHA256 c3080905b708a0777efd6dbd1372a7766828ae9e0d8498b35179c01e4e9feb34
MD5 239d427c00b3e7b49a9fbf7f9f52874a
BLAKE2b-256 45a7ed9b245ae1f272cedb790abe5629fb9aa346050024b82aea2e2060946818

See more details on using hashes here.

File details

Details for the file knowledgegpt-0.0.3b0-py3-none-any.whl.

File metadata

File hashes

Hashes for knowledgegpt-0.0.3b0-py3-none-any.whl
Algorithm Hash digest
SHA256 bdb66e40dbcf2320e63e80922f15eb7445ac4d66d2638c126dfa88e30b6e8756
MD5 cdbd547ee3a7e13e7f6d5735fb191cc7
BLAKE2b-256 452c482cbb33b3c75f4b35bd73a370e14c8b33e1eb11a0d77cb4fa1d5ad69c14

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page