easy to use bert with nvidia triton inference server

These details have not been verified by PyPI

Project description

It is easy to use bert in triton inference server now.

triton_bert.py Algorithm Engineer only need to focus to write proprocess function to make his model work.
model_4_triton.py Tool for transfer huggingface pytorch.bin model into torchscript or onnx to be used in triton inference server.
pgvector_triton.py Tool for easy usage if you use postgresql/pgvector as the vector db in semantic retrieval

Github

USAGES:

Example 0:

Embedding model(Biencoder) embedding output from model can be used directly. so no need to override the proprocess function.

from triton_bert.triton_bert import TritonBert

if __name__ == "__main__":
    model = TritonBert(triton_host="127.0.0.1", model="sbert_onnx", 
                       vocab="/Users/xxx/.cache/torch/sentence_transformers/sentence-transformers_all-MiniLM-L6-v2")

    # batch inferences
    vectors = model(["基金的收益率是多少？", "我有个朋友的股票天天涨停"])
    # or
    vectors = model.encode(["基金的收益率是多少？", "我有个朋友的股票天天涨停"])
    assert len(vectors) == 2

    # single inference
    vector = model.encode("基金的收益率是多少？")
    assert vectors[0] == vector

Example 1:

Embedding model(Biencoder) Embedding need normalized, override the proprecess

from triton_bert.triton_bert import TritonBert
import numpy as np

class Biencoder(TritonBert):
    def __init__(self, triton_host:str, model: str, vocab:str, **kwargs):
        super().__init__(triton_host=triton_host, model=model, vocab=vocab, **kwargs)
        self.normalize_vector = True

    def proprocess(self, triton_output):
        if self.normalize_vector:
            #if you use IP, you must normalize the vector which is the same as cosine
            return [(x /np.linalg.norm(x)).tolist() for x in triton_output[0]]
        return triton_output[0].tolist()

Example 2:

Rank model(CrossEncoder) user query is compared the most similar top N results with each other, and find the most similar one.

from triton_bert.triton_bert import TritonBert
import numpy as np

class CrossEncoder(TritonBert):
    '''
    rank with text similarity
    '''
    def __init__(self, triton_host:str, model: str, vocab:str, **kwargs):
        super().__init__(triton_host=triton_host, model=model, vocab=vocab, **kwargs)

    def proprocess(self, triton_output):
        return np.squeeze(triton_output[0], axis=1).tolist()

    def __call__(self, query, text_pairs):
        #change user rank input into our input pairs
        texts = len(text_pairs)*[query]
        return self.predict(texts, text_pairs)

if __name__ == "__main__":
    model = CrossEncoder(triton_host="xx", model="xx", vocab="xx")
    model("小明借了小红500元", ['小红借了小明500元', '小明还了小红500元', '小明借了小红400元'])

Example 3

ChitChat Intention Detection.

from triton_bert.triton_bert import TritonBert
import torch.nn.functional as F
import torch

class ChitchatIntentDetection(TritonBert):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.label_list = ["闲聊", "问答", "扯淡"]

    def proprocess(self, triton_output):
        logits = triton_output[0]
        label_ids = logits.argmax(axis=-1)
        logits = torch.tensor(logits)
        probs = F.softmax(logits, dim=1).numpy()
        ret = []
        for i, label_id in enumerate(label_ids):
            prob = probs[i][label_id]
            if label_id == 2 and prob < 0.8:
                label_id = 0
            ret.append({"category": self.label_list[label_id], "confidence": float(prob)})
        return ret

run examples

run triton server

# for example
docker run -d  --name triton-server   --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864  --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /home/xxxx/triton_models:/models  nvcr.io/nvidia/tritonserver::22.08-py3 tritonserver --model-repository=/models  --model-control-mode=poll  --exit-on-error=false --log-verbose 1
# configure triton model folder

prepare model for triton server

See the tests for more examples.

Example:

from triton_bert.model_4_triton import Model4TritonServer

if __name__ == "__main__":
    pretrained_model = "/Users/xxxxx/.cache/torch/sentence_transformers/simcse-chinese-roberta-wwm-ext"
    model = Model4TritonServer(pretrained_model=pretrained_model)
    model.save_torchscript("model/simcse_model.pt")
    model.save_onnx("model/simcse_model.onnx")

Semantic Retrieval with Postgresql/pgvector

Example 1

    instance = PgvectorTriton(db_user="xxx", db_password='xxxx',
                              db_instance="xxxx", db_port="3671",
                   db_schema="xxx", create_table=True,
                   triton_host="xxx", model="bge-m3",
                   vocab="/Users/xxx/Codes/pingan_health_rag/models/bge-m3",
                              table_model=Sentence
                   )

    # insert
    qas = instance.load_texts("dataset/medical_qa.jsonl")
    answers = [qa['answers'][0] for qa in qas]
    instance.insert_vectors(answers)
    
    # retrieval
    recalls: List[Sentence] = instance.retrieval_vectors("我喉咙有些干")

    print(recalls[0].sentence)

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.3.3

Nov 23, 2024

0.3.2

May 9, 2024

0.3.0

Feb 21, 2024

0.2.2

Feb 20, 2024

0.2.1

Feb 20, 2024

0.2.0

Feb 20, 2024

0.1.0

Jan 3, 2024

0.0.3

Dec 7, 2021

0.0.2

Dec 7, 2021

0.0.1

Dec 4, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

triton_bert-0.3.3.tar.gz (157.1 kB view details)

Uploaded Nov 23, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

triton_bert-0.3.3-py3-none-any.whl (150.6 kB view details)

Uploaded Nov 23, 2024 Python 3

File details

Details for the file triton_bert-0.3.3.tar.gz.

File metadata

Download URL: triton_bert-0.3.3.tar.gz
Upload date: Nov 23, 2024
Size: 157.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.10.14 Darwin/22.5.0

File hashes

Hashes for triton_bert-0.3.3.tar.gz
Algorithm	Hash digest
SHA256	`294935efb14f308d7d65a54ebea330a206e1125bbd3e7bd1b04cc7c2535c7e98`
MD5	`66ef2ed894f5f6dea1634e1f464a5446`
BLAKE2b-256	`e73b2820efe3e5f29e62cd560301b9fd38c5d139b294080b7f0596bf5c4f6114`

See more details on using hashes here.

File details

Details for the file triton_bert-0.3.3-py3-none-any.whl.

File metadata

Download URL: triton_bert-0.3.3-py3-none-any.whl
Upload date: Nov 23, 2024
Size: 150.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.10.14 Darwin/22.5.0

File hashes

Hashes for triton_bert-0.3.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`82ccb5c272ee72a1ffd083eda6bbc46cb9d500a984e4ba04d3d97e648408c667`
MD5	`a94b98bc04f88a7ed6a596b937bf330d`
BLAKE2b-256	`51423c0ee82a989c9fbb5bd850e23a723014d7d3c624af75113d660fe320c56e`

See more details on using hashes here.

triton-bert 0.3.3

Navigation

Verified details

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

Github

USAGES:

Example 0:

Example 1:

Example 2:

Example 3

run examples

run triton server

prepare model for triton server

Semantic Retrieval with Postgresql/pgvector

Project details

Verified details

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes