Skip to main content

No project description provided

Project description

BotRun Ask Folder

這個專案提供了一個從 Google Drive 資料夾下載文件並處理成嵌入式向量,最後將其上傳到 Qdrant 的工具。以下是如何使用這個工具的說明。


安裝

請先確保您已經安裝 Python 以及 pip。然後,您可以使用以下指令來安裝這個專案的依賴套件:

pip install botrun-ask-folder

使用方法

調用 botrun_ask_folder

botrun_ask_folder 函數可以幫助您下載指定 Google Drive 資料夾中的文件,進行處理並上傳到 Qdrant。

from botrun_ask_folder import botrun_ask_folder

# Google Drive 資料夾ID
google_drive_folder_id = "your_google_drive_folder_id"

botrun_ask_folder(google_drive_folder_id)

所需環境變數

在運行此工具前,請設置以下環境變數:

環境變數 說明
GOOGLE_APPLICATION_CREDENTIALS 用於Google服務帳戶的憑證路徑
QDRANT_HOST Qdrant 伺服器的主機名 (default為 "qdrant")
QDRANT_PORT Qdrant 伺服器的埠號 (default為 6333)

各個函數的詳細用法

drive_download

從 Google Drive 下載文件。

from botrun_ask_folder.drive_download import drive_download

google_service_account_key_path = "/path/to/google_service_account_key.json"
google_drive_folder_id = "your_google_drive_folder_id"
max_results = 9999999
output_folder = "./data/your_google_drive_folder_id"

drive_download(google_service_account_key_path, google_drive_folder_id, max_results, output_folder)

run_split_txts

將下載的文件切分成指定大小的文本片段。

from botrun_ask_folder.run_split_txts import run_split_txts

input_folder = "./data/your_google_drive_folder_id"
split_size = 2000  # 每個文本片段的最大字符數
force = False

run_split_txts(input_folder, split_size, force)

embeddings_to_qdrant

將文本片段轉換為嵌入式向量並上傳到 Qdrant。

import asyncio
from botrun_ask_folder.embeddings_to_qdrant import embeddings_to_qdrant

input_folder = "./data/your_google_drive_folder_id"
embedding_model_name = "openai/text-embedding-3-large"
dimension = 3072
max_tasks = 30
collection_name = "your_google_drive_folder_id"
qdrant_host = "qdrant"
qdrant_port = 6333

asyncio.run(embeddings_to_qdrant(input_folder, embedding_model_name, dimension, max_tasks, collection_name, qdrant_host, qdrant_port))

botrun_drive_manager

管理和更新 .botrun 提示工程的模板與副本。

from botrun_ask_folder.botrun_drive_manager import botrun_drive_manager

botrun_file_name = "your_botrun_file_name"
collection_name = "your_collection_name"

botrun_drive_manager(botrun_file_name, collection_name)

開啟 Fast API 的方式

到目錄 botrun_ask_folder/fast_api 下,執行以下指令:

fastapi dev main.py

然後可以透過 http://localhost:8000 存取 api

佈署 Google Cloud Function

前有使用 Google Cloud Function,檔案在主目錄下的 main.py 佈署方式如下: 要先讓 gcloud cli 有 botrun-ask-folder-2@scoop-386004.iam.gserviceaccount.com service account 的權限 去 console 下載,或是跟阿杰要

gcloud auth activate-service-account \
    --key-file=/path/to/your/keyfile.json

開發環境設置

創建虛擬環境

為專案創建一個虛擬環境,以便管理依賴包和避免與其他專案的衝突。

python -m venv venv
source venv/bin/activate  # 在 Windows 上使用 `venv\Scripts\activate`

安裝依賴

在虛擬環境中安裝必要的依賴包。

pip install -r requirements.txt

運行單元測試

運行項目的單元測試,以確保所有功能都正確實現。

python -m unittest discover tests

常見問題

無法下載文件,出現許可權錯誤?

請確保您的 Google 服務帳戶憑證具有訪問所需 Google Drive 資料夾的正確許可權。

Qdrant 連接失敗?

請檢查您的 Qdrant 伺服器主機和埠號是否正確,以及是否已啟動並可連接。

如何自訂分頁處理的字符數量?

您可以在呼叫 run_split_txts 時傳遞 split_size 參數來設置每頁的最大字符數。

將 botrun_ask_folder 使用 fastapi 服務

需要有一個 .env.cloudrun 跟阿杰拿

打包 cloud run, dev 的版本

gcloud builds submit --config cloudbuild_fastapi_dev.yaml --project=scoop-386004

deploy cloud run, dev 的版本

gcloud run deploy botrun-ask-folder-fastapi-dev \
  --image asia-east1-docker.pkg.dev/scoop-386004/botrun-ask-folder/botrun-ask-folder-fastapi-dev \
  --port 8080 \
  --platform managed \
  --allow-unauthenticated \
  --project=scoop-386004 \
  --region=asia-east1 \
  --cpu 2 \
  --memory 4Gi \
  --min-instances 0 \
  --max-instances 5 \
  --timeout 3600s \
  --concurrency 300 \
  --cpu-boost \

打包 Cloud Run, staging 的版本

gcloud builds submit --config cloudbuild_fastapi.yaml --project=scoop-386004

佈署 cloud run, staging 的版本

gcloud run deploy botrun-ask-folder-fastapi \
  --image asia-east1-docker.pkg.dev/scoop-386004/botrun-ask-folder/botrun-ask-folder-fastapi \
  --port 8080 \
  --platform managed \
  --allow-unauthenticated \
  --project=scoop-386004 \
  --region=asia-east1 \
  --cpu 2 \
  --memory 8Gi \
  --min-instances 0 \
  --max-instances 5 \
  --timeout 3600s \
  --concurrency 300 \
  --cpu-boost \

打包 cloud run job

gcloud builds submit --config cloudbuild_job.yaml --project=scoop-386004

deploy cloud run job

gcloud run jobs create process-folder-job \
--image asia-east1-docker.pkg.dev/scoop-386004/botrun-ask-folder/botrun-ask-folder-job \
--region asia-east1 \
--project scoop-386004 \
--cpu 2 \
--memory 8Gi \
--max-retries 3 \
--task-timeout 7200s 

update cloud run job

gcloud run jobs update process-folder-job \
--image asia-east1-docker.pkg.dev/scoop-386004/botrun-ask-folder/botrun-ask-folder-job \
--region asia-east1 \
--project scoop-386004 \
--cpu 2 \
--memory 8Gi \
--max-retries 3 \
--task-timeout 7200s

cancel 正在執行的 job

gcloud run jobs executions list --job process-folder-job --format="value(name)" --region=asia-east1 --project=scoop-386004 | xargs -I {} gcloud run jobs executions cancel {} --region=asia-east1 --project=scoop-386004 --quiet

Qdrant 要加 api_key

本地端

docker run -d \  
     -p 6333:6333 \
     -p 6334:6334 \
     -e QDRANT__SERVICE__GRPC_PORT=6334 \
     -e QDRANT__SERVICE__HTTP_PORT=6333 \
     -e QDRANT__SERVICE__API_KEY=your-own-api-key \
     -v ./qdrant_storage:/qdrant/storage \
     --name qdrant \
     qdrant/qdrant

server 端 docker-compose

Dapr

執行

dapr run -f dapr.yaml

停止

dapr stop -f dapr.yaml

測試 dapr

青創貸款

curl -X POST http://localhost:8000/api/botrun/botrun_ask_folder/process-folder \
-H "Content-Type: application/json" \
-d '{"folder_id": "1qk5maEqbxtTcr1tsAHawVduonPedpHV0", "force":true}'

Dapr 佈署到 Cloud Run (以下還在實驗階段,目前還沒有成功)

在專案目錄下執行

  • 不要使用專案的 venv 環境,要在本機自己安裝 gcloud
  • service account 要用 另一個,跟阿杰拿

打包 docker

gcloud builds submit --tag gcr.io/scoop-386004/botrun-ask-folder ./botrun_ask_folder/fast_api --project=scoop-386004
gcloud builds submit --config ./botrun_ask_folder/fast_api/cloudbuild.yaml --project=scoop-386004
gcloud builds submit --tag gcr.io/scoop-386004/subscriber ./botrun_ask_folder/subscribers --project=scoop-386004
gcloud builds submit --config cloudbuild_dapr.yaml --project=scoop-386004

佈署

gcloud run services replace botrun-ask-folder-service.yaml --platform managed --region asia-east1 --project=scoop-386004
gcloud run services replace subscriber-service.yaml --platform managed --region asia-east1 --project=scoop-386004

如果要設環境變數 (留存參考)

gcloud run services update botrun-ask-folder --set-env-vars KEY1=VALUE1,KEY2=VALUE2
gcloud run services update subscriber --set-env-vars KEY1=VALUE1,KEY2=VALUE2

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

botrun_ask_folder-4.11.271.tar.gz (96.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

botrun_ask_folder-4.11.271-py2.py3-none-any.whl (144.5 kB view details)

Uploaded Python 2Python 3

File details

Details for the file botrun_ask_folder-4.11.271.tar.gz.

File metadata

  • Download URL: botrun_ask_folder-4.11.271.tar.gz
  • Upload date:
  • Size: 96.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for botrun_ask_folder-4.11.271.tar.gz
Algorithm Hash digest
SHA256 a14909042a3191ca035994a3e745775d7fbb4007ddfa5c11c21a55d34572e5f1
MD5 d4277feb965996a5c0bc6793ef03c255
BLAKE2b-256 898f4229080165df14419b6b5333af003e96e9244dcb49685c4277dcd7b650ef

See more details on using hashes here.

File details

Details for the file botrun_ask_folder-4.11.271-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for botrun_ask_folder-4.11.271-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 0f2bb533bc6384bc6dff1a1202ba23b2204d08863c8b8002b0b7379e635c56e5
MD5 149ce4d1b49daa31a404c150a7f92bf3
BLAKE2b-256 11ea9041c153bb26c95680bf94c0771e73639d36c98528235030b37076dd3388

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page