Skip to main content

Use GPT to chat/search your large Cubox datasets

Project description

cuboxGPT

Use GPT to help users quickly search/chat with your large cubox dataset.

Use

Install the package.

pip install cuboxGPT

Export the cubox dataset as html file. export

Call the command line tool

# set openai api key
EXPORT OPENAI_API_KEY=<your openai api key>

# import all cubox bookmarks and downald all web contents.
# Note that the cli will output links that are failed to download and links that have not enough contents.
cuboxgpt  import-data <cubox_export.html file location>

# Init the vector database. Put all downloaded web contents to the vector database and generate embeddings. Save the database in db/ folder.
cuboxgpt init-database

# chat/seach with the dataset
cuboxgpt search <query>

Development

venv ./venv
source ./venv/bin/activate
pip install --editable .

cuboxGPT.py has all comand line tools implementation.

chatFromDB.py reads from the database and implement the query function.

webPraser.py takes responsibility to parse the html file and download the web contents.

db.py generate embeddings and save web contents to the database.

pyproject.toml contains ruff lint configuration.

Roadmap

Goal: Enhance the search experience and easily keep datasets up to date.

  • Better CRUD on database. Users can update/delete single ducoments in the database.
  • Seach document with custom filter on metadata.
  • Better parsing rule for certain websites like Twitter, Youtube with Chinese characters, Weixin
  • Better updating experience if user input a new cubox export file.
  • Pagination for search results.
  • Analyze user's query to better hit keywords.
  • For links failed to download, retry with Seleum
  • Support multi-threading for downloading web contents.
  • Better title by supporting open graph meta tags

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cuboxgpt-0.1.1.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cuboxgpt-0.1.1-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file cuboxgpt-0.1.1.tar.gz.

File metadata

  • Download URL: cuboxgpt-0.1.1.tar.gz
  • Upload date:
  • Size: 5.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.2

File hashes

Hashes for cuboxgpt-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c089e17a7198a6a16be116f6b5b2747a3a6660356ed4b6b20efb85cbf771c8d0
MD5 edf5610878fabb0522470e39f604122a
BLAKE2b-256 a4a36c8cf34f672631169324ef67dbed4e6a55fc0fadb42f9bcf1f5457c3c6db

See more details on using hashes here.

File details

Details for the file cuboxgpt-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: cuboxgpt-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.2

File hashes

Hashes for cuboxgpt-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a9d2380c49754c13c6ba6849e0a8203328f7aca9d574be4768b5861634cb19bf
MD5 1c0aad3e661a00ccf802ffa8e9d6ee50
BLAKE2b-256 cf8cbf1892f6d515bee1846bbb751ec71bd054cdd078979dc4a6c5b1f749aa1c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page