No project description provided
Project description
BotRun Ask Folder
這個專案提供了一個從 Google Drive 資料夾下載文件並處理成嵌入式向量,最後將其上傳到 Qdrant 的工具。以下是如何使用這個工具的說明。
安裝
請先確保您已經安裝 Python 以及 pip。然後,您可以使用以下指令來安裝這個專案的依賴套件:
pip install botrun-ask-folder
使用方法
調用 botrun_ask_folder
botrun_ask_folder
函數可以幫助您下載指定 Google Drive 資料夾中的文件,進行處理並上傳到 Qdrant。
from botrun_ask_folder import botrun_ask_folder
# Google Drive 資料夾ID
google_drive_folder_id = "your_google_drive_folder_id"
botrun_ask_folder(google_drive_folder_id)
所需環境變數
在運行此工具前,請設置以下環境變數:
環境變數 | 說明 |
---|---|
GOOGLE_APPLICATION_CREDENTIALS | 用於Google服務帳戶的憑證路徑 |
QDRANT_HOST | Qdrant 伺服器的主機名 (default為 "qdrant") |
QDRANT_PORT | Qdrant 伺服器的埠號 (default為 6333) |
各個函數的詳細用法
drive_download
從 Google Drive 下載文件。
from botrun_ask_folder.drive_download import drive_download
google_service_account_key_path = "/path/to/google_service_account_key.json"
google_drive_folder_id = "your_google_drive_folder_id"
max_results = 9999999
output_folder = "./data/your_google_drive_folder_id"
drive_download(google_service_account_key_path, google_drive_folder_id, max_results, output_folder)
run_split_txts
將下載的文件切分成指定大小的文本片段。
from botrun_ask_folder.run_split_txts import run_split_txts
input_folder = "./data/your_google_drive_folder_id"
split_size = 2000 # 每個文本片段的最大字符數
force = False
run_split_txts(input_folder, split_size, force)
embeddings_to_qdrant
將文本片段轉換為嵌入式向量並上傳到 Qdrant。
import asyncio
from botrun_ask_folder.embeddings_to_qdrant import embeddings_to_qdrant
input_folder = "./data/your_google_drive_folder_id"
embedding_model_name = "openai/text-embedding-3-large"
dimension = 3072
max_tasks = 30
collection_name = "your_google_drive_folder_id"
qdrant_host = "qdrant"
qdrant_port = 6333
asyncio.run(embeddings_to_qdrant(input_folder, embedding_model_name, dimension, max_tasks, collection_name, qdrant_host, qdrant_port))
botrun_drive_manager
管理和更新 .botrun 提示工程的模板與副本。
from botrun_ask_folder.botrun_drive_manager import botrun_drive_manager
botrun_file_name = "your_botrun_file_name"
collection_name = "your_collection_name"
botrun_drive_manager(botrun_file_name, collection_name)
開啟 Fast API 的方式
到目錄 botrun_ask_folder/fast_api
下,執行以下指令:
fastapi dev main.py
然後可以透過 http://localhost:8000 存取 api
佈署 Google Cloud Function
前有使用 Google Cloud Function,檔案在主目錄下的 main.py 佈署方式如下: 要先讓 gcloud cli 有 botrun-ask-folder-2@scoop-386004.iam.gserviceaccount.com service account 的權限 去 console 下載,或是跟阿杰要
gcloud auth activate-service-account \
--key-file=/path/to/your/keyfile.json
deploy
gcloud functions deploy cf_pdf_page_to_image \
--project=scoop-386004 \
--region=asia-east1 \
--gen2 \
--runtime python311 \
--trigger-http \
--allow-unauthenticated \
--memory 512MB \
--timeout 540s \
--service-account=botrun-ask-folder-2@scoop-386004.iam.gserviceaccount.com\
--ignore-file=.gcloudignore
佈署完之後,會得到一個 url,可以透過這個 url 存取 api 目前有支援的有:
deploy cf_query_qdrant_and_llm
gcloud functions deploy cf_query_qdrant_and_llm \
--project=scoop-386004 \
--region=asia-east1 \
--gen2 \
--runtime python311 \
--trigger-http \
--allow-unauthenticated \
--memory 8192MB \
--timeout 540s \
--service-account=botrun-ask-folder-2@scoop-386004.iam.gserviceaccount.com\
--ignore-file=.gcloudignore
呼叫方式
curl -X POST -N --no-buffer "https://asia-east1-scoop-386004.cloudfunctions.net/cf_query_qdrant_and_llm" \
-H "Content-Type: application/json" \
-d '{
"qdrant_host": "dev.botrun.ai",
"collection_name": "1qk5maEqbxtTcr1tsAHawVduonPedpHV0",
"user_input": "青創貸款怎麼樣申請最快速?",
"qdrant_port": 6333,
"embedding_model": "openai/text-embedding-3-large",
"top_k": 6,
"notice_prompt": "",
"chat_model": "openai/gpt-4o-mini",
"hnsw_ef": 256
}'
curl -X POST -N --no-buffer "https://asia-east1-scoop-386004.cloudfunctions.net/cf_query_qdrant_and_llm" \
-H "Content-Type: application/json" \
-d '{
"qdrant_host": "dev.botrun.ai",
"collection_name": "1qk5maEqbxtTcr1tsAHawVduonPedpHV0",
"user_input": "青創貸款怎麼樣申請最快速?",
"qdrant_port": 6333,
"embedding_model": "openai/text-embedding-3-large",
"top_k": 6,
"notice_prompt": "",
"chat_model": "openai/gpt-4o-mini",
"hnsw_ef": 256,
"stream": false
}'
deploy cf_query_qdrant_and_llm_from_botrun
gcloud functions deploy cf_query_qdrant_and_llm_from_botrun \
--project=scoop-386004 \
--region=asia-east1 \
--gen2 \
--runtime python311 \
--trigger-http \
--allow-unauthenticated \
--memory 8192MB \
--timeout 540s \
--service-account=botrun-ask-folder-2@scoop-386004.iam.gserviceaccount.com\
--ignore-file=.gcloudignore
呼叫方式
curl -X POST \
https://asia-east1-scoop-386004.cloudfunctions.net/cf_query_qdrant_and_llm_from_botrun \
-H "Content-Type: application/json" \
-d '{
"qdrant_host": "dev.botrun.ai",
"botrun_name": "波創價學會",
"folder_id": "1dqIGPK-hbyfrKbetQiWy3JW_jXGKG2YF",
"user_input": "創價學會的宗指為何?"
}'
curl -X POST \
https://asia-east1-scoop-386004.cloudfunctions.net/cf_query_qdrant_and_llm_from_botrun \
-H "Content-Type: application/json" \
-d '{
"qdrant_host": "dev.botrun.ai",
"botrun_name": "波創價學會",
"folder_id": "1dqIGPK-hbyfrKbetQiWy3JW_jXGKG2YF",
"user_input": "創價學會的宗指為何?",
"stream": false
}'
deploy cf_get_latest_timestamp
gcloud functions deploy cf_get_latest_timestamp \
--project=scoop-386004 \
--region=asia-east1 \
--gen2 \
--runtime python311 \
--trigger-http \
--allow-unauthenticated \
--memory 8192MB \
--timeout 540s \
--service-account=botrun-ask-folder-2@scoop-386004.iam.gserviceaccount.com\
--ignore-file=.gcloudignore
呼叫方式
curl -X POST \
https://asia-east1-scoop-386004.cloudfunctions.net/cf_get_latest_timestamp \
-H "Content-Type: application/json" \
-d '{
"botrun_name": "波創價學會",
"folder_id": "1dqIGPK-hbyfrKbetQiWy3JW_jXGKG2YF"
}'
開發環境設置
創建虛擬環境
為專案創建一個虛擬環境,以便管理依賴包和避免與其他專案的衝突。
python -m venv venv
source venv/bin/activate # 在 Windows 上使用 `venv\Scripts\activate`
安裝依賴
在虛擬環境中安裝必要的依賴包。
pip install -r requirements.txt
運行單元測試
運行項目的單元測試,以確保所有功能都正確實現。
python -m unittest discover tests
常見問題
無法下載文件,出現許可權錯誤?
請確保您的 Google 服務帳戶憑證具有訪問所需 Google Drive 資料夾的正確許可權。
Qdrant 連接失敗?
請檢查您的 Qdrant 伺服器主機和埠號是否正確,以及是否已啟動並可連接。
如何自訂分頁處理的字符數量?
您可以在呼叫 run_split_txts
時傳遞 split_size
參數來設置每頁的最大字符數。
將 botrun_ask_folder 使用 fastapi 服務
需要有一個 .env.cloudrun 跟阿杰拿
打包 cloud run
gcloud builds submit --config cloudbuild_fastapi.yaml --project=scoop-386004
deploy cloud run
gcloud run deploy botrun-ask-folder-fastapi \
--image gcr.io/scoop-386004/botrun-ask-folder-fastapi \
--port 8080 \
--platform managed \
--allow-unauthenticated \
--project=scoop-386004 \
--region=asia-east1 \
--cpu 2 \
--memory 8Gi \
--min-instances 0 \
--max-instances 5 \
--timeout 3600s \
--concurrency 300 \
--cpu-boost \
打包 cloud run job
gcloud builds submit --config cloudbuild_job.yaml --project=scoop-386004
deploy cloud run job
gcloud run jobs create process-folder-job \
--image gcr.io/scoop-386004/botrun-ask-folder-job \
--region asia-east1 \
--project scoop-386004 \
--cpu 2 \
--memory 8Gi \
--max-retries 3 \
--task-timeout 3600s
update cloud run job
gcloud run jobs update process-folder-job \
--image gcr.io/scoop-386004/botrun-ask-folder-job \
--region asia-east1 \
--project scoop-386004 \
--cpu 2 \
--memory 8Gi \
--max-retries 3 \
--task-timeout 3600s
Dapr
執行
dapr run -f dapr.yaml
停止
dapr stop -f dapr.yaml
測試 dapr
青創貸款
curl -X POST http://localhost:8000/api/botrun/botrun_ask_folder/process-folder \
-H "Content-Type: application/json" \
-d '{"folder_id": "1qk5maEqbxtTcr1tsAHawVduonPedpHV0", "force":true}'
Dapr 佈署到 Cloud Run (以下還在實驗階段,目前還沒有成功)
在專案目錄下執行
- 不要使用專案的 venv 環境,要在本機自己安裝 gcloud
- service account 要用 另一個,跟阿杰拿
打包 docker
gcloud builds submit --tag gcr.io/scoop-386004/botrun-ask-folder ./botrun_ask_folder/fast_api --project=scoop-386004
gcloud builds submit --config ./botrun_ask_folder/fast_api/cloudbuild.yaml --project=scoop-386004
gcloud builds submit --tag gcr.io/scoop-386004/subscriber ./botrun_ask_folder/subscribers --project=scoop-386004
gcloud builds submit --config cloudbuild_dapr.yaml --project=scoop-386004
佈署
gcloud run services replace botrun-ask-folder-service.yaml --platform managed --region asia-east1 --project=scoop-386004
gcloud run services replace subscriber-service.yaml --platform managed --region asia-east1 --project=scoop-386004
如果要設環境變數 (留存參考)
gcloud run services update botrun-ask-folder --set-env-vars KEY1=VALUE1,KEY2=VALUE2
gcloud run services update subscriber --set-env-vars KEY1=VALUE1,KEY2=VALUE2
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for botrun_ask_folder-4.9.21-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a61f1530d789d3d4871152e18edb2bb14c440efee5d689d5ad00e376bdb00f60 |
|
MD5 | 0e0be2e03f043b334e34b119c74d963c |
|
BLAKE2b-256 | 477ced26550885209dd618bfb6b8154853ce647e745f486c8322e5f196e9b327 |