Skip to main content

Text to vector Tool, encode text

Project description

text2vec-onnx

本项目是 text2vec 项目的 onnxruntime 推理版本,实现了向量获取和文本匹配搜索。为了保证项目的轻量,只使用了 onnxruntimetokenizersnumpy 三个库。

主要在 GanymedeNil/text2vec-base-chinese-onnx 模型上进行测试,理论上支持 BERT 系列模型。

安装

CPU 版本

pip install text2vec2onnx[cpu]

GPU 版本

pip install text2vec2onnx[gpu]

使用

模型下载

以下载 GanymedeNil/text2vec-base-chinese-onnx 为例,下载模型到本地。

  • huggingface 模型下载
huggingface-cli download --resume-download GanymedeNil/text2vec-base-chinese-onnx --local-dir text2vec-base-chinese-onnx

向量获取

from text2vec2onnx import SentenceModel
embedder = SentenceModel(model_dir_path='local-dir')
emb = embedder.encode("你好")

文本匹配搜索

from text2vec2onnx import SentenceModel, semantic_search

embedder = SentenceModel(model_dir_path='local-dir')

corpus = [
    "谢谢观看 下集再见",
    "感谢您的观看",
    "请勿模仿",
    "记得订阅我们的频道哦",
    "The following are sentences in English.",
    "Thank you. Bye-bye.",
    "It's true",
    "I don't know.",
    "Thank you for watching!",
]
corpus_embeddings = embedder.encode(corpus)

queries = [
    'Thank you. Bye.',
    '你干啥呢',
    '感谢您的收听']

for query in queries:
    query_embedding = embedder.encode(query)
    hits = semantic_search(query_embedding, corpus_embeddings, top_k=1)
    print("\n\n======================\n\n")
    print("Query:", query)
    print("\nTop 5 most similar sentences in corpus:")
    hits = hits[0]  # Get the hits for the first query
    for hit in hits:
        print(corpus[hit['corpus_id']], "(Score: {:.4f})".format(hit['score']))

License

Appache License 2.0

References

Buy me a coffee

Buy Me A Coffee

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

text2vec2onnx-1.0.0.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

text2vec2onnx-1.0.0-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file text2vec2onnx-1.0.0.tar.gz.

File metadata

  • Download URL: text2vec2onnx-1.0.0.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for text2vec2onnx-1.0.0.tar.gz
Algorithm Hash digest
SHA256 15e89e7f1c3063f426a3dd41b3983afc454d1b1cc3c8c91f814d06264310f154
MD5 5ba67f1f607b3277896ba39efcbf12d8
BLAKE2b-256 e6509ba0672e0cc83c1fe898de7bac364b3a3642dd27a0f10006723874e9493f

See more details on using hashes here.

File details

Details for the file text2vec2onnx-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: text2vec2onnx-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 11.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for text2vec2onnx-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5f30c3ee3008f63f79d3cd99bab245a3cfe9754256e2c20477c3a664ea3cdaba
MD5 b5a01e3e820ce34231e2d851757283d5
BLAKE2b-256 3c3a9273608af39a1d0caeb36929db92c1082b63919df534547f17fc98e1e455

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page