Skip to main content

No project description provided

Project description

積木塊 Similarity Ranker v1.0.0 by Bowen Chiu

  • 找到與給定查詢句子 v.s. 最相似的嵌入向量
  • 這東西可以用來做任意 .txt & .pt 遞迴子目錄的相似度比對
  • 找出排名最相似的 top 10 輸出 .json
  • 使用的是 Hugging Face Transformers 的 paraphrase-multilingual-MiniLM-L12-v2 模型。

環境設置

在開始之前,首先確保你已經安裝了以下的 Python 庫:

python3 -m pip install similarity-ranker

使用方法

作為命令行工具使用

你可以通過運行以下命令來使用 similarity_ranker.py:

python3 -m similarity_ranker \
  --prompt "你的查詢句子" \
  --txt-folder "包含 txt 文件的文件夾" \
  --embeddings-folder "包含嵌入向量文件的文件夾" \
  --output-json "輸出 JSON 文件名(可選)"

作為模塊導入使用

首先,導入 similarity_ranker

from similarity_ranker import query_embeddings, save_ranking_to_json

然後,使用 query_embeddings 函數獲取查詢句子的相似度排名:

prompt = "你的查詢句子"
embeddings_folder = "包含嵌入向量文件的文件夾"
ranking = query_embeddings(prompt, embeddings_folder)

接著,使用 save_ranking_to_json 函數將排名結果保存為 JSON 文件:

txt_folder = "包含 txt 文件的文件夾"
output_file = "輸出 JSON 文件名"
save_ranking_to_json(prompt, ranking, txt_folder, output_file)

範例

假設你有以下文件結構:

data/
  txt/
    file1.txt
    file2.txt
    ...
  embeddings/
    file1.pt
    file2.pt
    ...

你可以通過運行以下命令找到與查詢句子最相似的嵌入向量:

python3 -m similarity_ranker \
  --prompt "你的查詢句子" \
  --txt-folder "data/txt" \
  --embeddings-folder "data/embeddings" \
  --output-json "data/top_similarity.json"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

similarity-ranker-1.0.3.tar.gz (3.2 kB view details)

Uploaded Source

Built Distribution

similarity_ranker-1.0.3-py3-none-any.whl (3.6 kB view details)

Uploaded Python 3

File details

Details for the file similarity-ranker-1.0.3.tar.gz.

File metadata

  • Download URL: similarity-ranker-1.0.3.tar.gz
  • Upload date:
  • Size: 3.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.2

File hashes

Hashes for similarity-ranker-1.0.3.tar.gz
Algorithm Hash digest
SHA256 27dc3fa162407c0017bf2b569c62485f4fa386ef6b1b733ec4cbde55ff5bba25
MD5 1b73e5f67c8dc61135e68402e52507e2
BLAKE2b-256 56bf8e2ce5b45eb08dded383f4e6fec3ebb3ebc925fcac9976d0a2bbb60f32f7

See more details on using hashes here.

File details

Details for the file similarity_ranker-1.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for similarity_ranker-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 1380cb4cf8e74253dab74d84c2c6764625d5a3f162a6d4d492939ad830938ff0
MD5 eb6dd279857cf5b94edd590c2d563e94
BLAKE2b-256 29cdeebdee3a6461dbe2f8577dcf23e04f205f8d3b3bdd59c854d42554011e30

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page