Skip to main content

Use GPT to chat/search your large Cubox datasets

Project description

cuboxGPT

Use GPT to help users quickly search/chat with your large cubox dataset.

Use

Install the package.

pip install cuboxGPT

Export the cubox dataset as html file. export

Call the command line tool

# set openai api key
EXPORT OPENAI_API_KEY=<your openai api key>

# import all cubox bookmarks and downald all web contents.
# Note that the cli will output links that are failed to download and links that have not enough contents.
cuboxgpt  import-data <cubox_export.html file location>

# Init the vector database. Put all downloaded web contents to the vector database and generate embeddings. Save the database in db/ folder.
cuboxgpt init-database

# chat/seach with the dataset
cuboxgpt search <query>

Development

venv ./venv
source ./venv/bin/activate
pip install --editable .

cuboxGPT.py has all comand line tools implementation.

chatFromDB.py reads from the database and implement the query function.

webPraser.py takes responsibility to parse the html file and download the web contents.

db.py generate embeddings and save web contents to the database.

pyproject.toml contains ruff lint configuration.

Roadmap

Goal: Enhance the search experience and easily keep datasets up to date.

  • Better CRUD on database. Users can update/delete single ducoments in the database.
  • Seach document with custom filter on metadata.
  • Better parsing rule for certain websites like Twitter, Youtube with Chinese characters, Weixin
  • Better updating experience if user input a new cubox export file.
  • Pagination for search results.
  • Analyze user's query to better hit keywords.
  • For links failed to download, retry with Seleum
  • Support multi-threading for downloading web contents.
  • Better title by supporting open graph meta tags

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cuboxgpt-0.1.1.tar.gz (5.8 kB view hashes)

Uploaded Source

Built Distribution

cuboxgpt-0.1.1-py3-none-any.whl (6.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page