Skip to main content

Simple hugging face text generation utility package with multiple prompt format support and history.

Project description

ouroboros

Unofficial Hugging Face Text Generation Utility Package.

[OUROBOROS VERSION 0.0.1]

Install ouroboros-hf-text-gen-utils

Simple installation from PyPI

pip install ouroboros_hf_text_gen_utils
Other installation options

Install from github directly

pip install git+https://github.com/VINUK0/Ouroboros-HF-TXT-GEN-UTILS.git

[Supported Features]

ONNX model inferece support via hugging face.

OPENVINO model inferece support via hugging face.

NF8 (8bit) and NF4 (4bit) inferece support via hugging face.

Better Transformers inferece support via hugging face.

⚠️ Hugging face is slowly dropping better transformers support since (scaled_dot_product_attention) is being added to pytorch nativly.

Flash Attention 2 inferece support via hugging face

⚠️ Your GPU architecture need to be Ampere or higher for flash attention 2 to work. (First Ampere based GPU is RTX 3080 as said by google bard.)


[Coming In The Future]

🚧 Save and Load Conversations.

🚧 Export the conversation's to human readable multiple prompt formats to create datasets.

🚧 Print total params in the model (I am busy to do it right now.)


[Supported Prompting Format's]

Alpaca Version 1

Backend Style
You are an helpful ai assistant.

### Instruction:
What is a computer.

### Response:
A computer is a programmable machine that manipulates information: it takes data, processes it, and stores it. Think of it as a powerful calculator that can follow instructions to do almost anything!

Alpaca Version 2 (Self Proclaimed)

Backend Style
### Instruction:
You are an helpful ai assistant.

### Input:
What is a computer.

### Response:
A computer is a programmable machine that manipulates information: it takes data, processes it, and stores it. Think of it as a powerful calculator that can follow instructions to do almost anything!

ChatML

Backend Style
<|im_start|>system:
You are an helpful ai assistant.<|im_end|>

<|im_start|>user:
What is a computer.<|im_end|>

<|im_start|>assistant:
A computer is a programmable machine that manipulates information: it takes data, processes it, and stores it. Think of it as a powerful calculator that can follow instructions to do almost anything!<|im_end|>

Ouroboros (Same as TinyLlama 1B format)

Backend Style
<|system|>
You are an helpful ai assistant.

<|user>
What is a computer.

<|model|>
A computer is a programmable machine that manipulates information: it takes data, processes it, and stores it. Think of it as a powerful calculator that can follow instructions to do almost anything!

Mixtral (This format is confusing to me.)

Backend Style
[INST] What is a computer.

[/INST] A computer is a programmable machine that manipulates information: it takes data, processes it, and stores it. Think of it as a powerful calculator that can follow instructions to do almost anything!

[How to basic]

⚠️ Available dtype's are F32, F16, BF16.

⚠️ Use can run these using this too accelerate launch your_file_name.py

⚠️ max_sys_prompt_length, max_prompt_length and max_hist_length is mesured by letters not tokens.

🛠️ Simple Inferece

Code example
from ouroboros_text_gen_utils import text_generation

api = text_generation(tokenizer_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
                      model_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
                      dtype="BF16")

history = []

system_prompt = """### Instruction:
You are an helpful ai assistant."""

history, output = api.inplace_alpaca_v2_style(history=history,
                                     system_prompt=system_prompt,
                                     prompt="What is a computer.",
                                     user_name="user",
                                     character_name="assistant",
                                     max_sys_prompt_length=2048,
                                     max_prompt_length=1024,
                                     max_hist_length=8096,
                                     max_new_tokens=256,
                                     min_new_tokens=10,
                                     top_p=0.8,
                                     top_k=50,
                                     temperature=0.5,
                                     repetition_penalty=1.1)

print(f"Model Generated Output: {output}")

🛠️ 8 bit and 4 bit inferece.

⚠️ When loading models in 8 bit or 4 bit make sure the dtype is either F16 or BF16.

Code example
from ouroboros_text_gen_utils import text_generation

api = text_generation(tokenizer_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
                      model_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
                      dtype="BF16", load_in_4bit=True)

# load_in_8bit=True can be also used.

history = []

system_prompt = """### Instruction:
You are an helpful ai assistant."""

history, output = api.inplace_alpaca_v2_style(history=history,
                                     system_prompt=system_prompt,
                                     prompt="What is a computer.",
                                     user_name="user",
                                     character_name="assistant",
                                     max_sys_prompt_length=2048,
                                     max_prompt_length=1024,
                                     max_hist_length=8096,
                                     max_new_tokens=256,
                                     min_new_tokens=10,
                                     top_p=0.8,
                                     top_k=50,
                                     temperature=0.5,
                                     repetition_penalty=1.1)

print(f"Model Generated Output: {output}")

🛠️ Flash Attention 1.

⚠️ Use can use this with load_in_4bit or load_in_8bit too.

⚠️ I have found a error when trying to run the model with flash attention 1 on a T4 GPU in BF16. If that happens use F16 instead.

Code example
from ouroboros_text_gen_utils import text_generation

api = text_generation(tokenizer_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
                      model_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
                      dtype="BF16", flash_attention="flash_attention_1")

history = []

system_prompt = """### Instruction:
You are an helpful ai assistant."""

history, output = api.inplace_alpaca_v2_style(history=history,
                                     system_prompt=system_prompt,
                                     prompt="What is a computer.",
                                     user_name="user",
                                     character_name="assistant",
                                     max_sys_prompt_length=2048,
                                     max_prompt_length=1024,
                                     max_hist_length=8096,
                                     max_new_tokens=256,
                                     min_new_tokens=10,
                                     top_p=0.8,
                                     top_k=50,
                                     temperature=0.5,
                                     repetition_penalty=1.1)

print(f"Model Generated Output: {output}")

🛠️ Flash Attention 2.

⚠️ Use can use this with load_in_4bit or load_in_8bit too.

⚠️ You need this package to run FA2 pip install flash-attn --no-build-isolation

Code example
from ouroboros_text_gen_utils import text_generation

api = text_generation(tokenizer_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
                      model_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
                      dtype="BF16", flash_attention="flash_attention_2")

history = []

system_prompt = """### Instruction:
You are an helpful ai assistant."""

history, output = api.inplace_alpaca_v2_style(history=history,
                                     system_prompt=system_prompt,
                                     prompt="What is a computer.",
                                     user_name="user",
                                     character_name="assistant",
                                     max_sys_prompt_length=2048,
                                     max_prompt_length=1024,
                                     max_hist_length=8096,
                                     max_new_tokens=256,
                                     min_new_tokens=10,
                                     top_p=0.8,
                                     top_k=50,
                                     temperature=0.5,
                                     repetition_penalty=1.1)

print(f"Model Generated Output: {output}")

🛠️ Onnx model.

⚠️ If you have a GPU change the onnx_execution_provider to onnx_execution_provider="CUDAExecutionProvider".

Code example
from ouroboros_text_gen_utils import text_generation

api = text_generation(tokenizer_name="example/onnx_model_1B",
                      model_name="example/onnx_model_1B",
                      onnx_model=True, onnx_execution_provider="CPUExecutionProvider")

history = []

system_prompt = """### Instruction:
You are an helpful ai assistant."""

history, output = api.inplace_alpaca_v2_style(history=history,
                                     system_prompt=system_prompt,
                                     prompt="What is a computer.",
                                     user_name="user",
                                     character_name="assistant",
                                     max_sys_prompt_length=2048,
                                     max_prompt_length=1024,
                                     max_hist_length=8096,
                                     max_new_tokens=256,
                                     min_new_tokens=10,
                                     top_p=0.8,
                                     top_k=50,
                                     temperature=0.5,
                                     repetition_penalty=1.1)

print(f"Model Generated Output: {output}")

🛠️ OpenVino model.

⚠️ If you have a GPU avaliable it will try to run from it.

Code example
from ouroboros_text_gen_utils import text_generation

api = text_generation(tokenizer_name="example/openvino_model_1B",
                      model_name="example/openvino_model_1B",
                      openvino_model=True)

history = []

system_prompt = """### Instruction:
You are an helpful ai assistant."""

history, output = api.inplace_alpaca_v2_style(history=history,
                                     system_prompt=system_prompt,
                                     prompt="What is a computer.",
                                     user_name="user",
                                     character_name="assistant",
                                     max_sys_prompt_length=2048,
                                     max_prompt_length=1024,
                                     max_hist_length=8096,
                                     max_new_tokens=256,
                                     min_new_tokens=10,
                                     top_p=0.8,
                                     top_k=50,
                                     temperature=0.5,
                                     repetition_penalty=1.1)

print(f"Model Generated Output: {output}")

🛠️ Better Transformer.

Code example
from ouroboros_text_gen_utils import text_generation

api = text_generation(tokenizer_name="example/openvino_model_1B",
                      model_name="example/openvino_model_1B",
                      better_transformers=True, dtype="BF16")

history = []

system_prompt = """### Instruction:
You are an helpful ai assistant."""

history, output = api.inplace_alpaca_v2_style(history=history,
                                     system_prompt=system_prompt,
                                     prompt="What is a computer.",
                                     user_name="user",
                                     character_name="assistant",
                                     max_sys_prompt_length=2048,
                                     max_prompt_length=1024,
                                     max_hist_length=8096,
                                     max_new_tokens=256,
                                     min_new_tokens=10,
                                     top_p=0.8,
                                     top_k=50,
                                     temperature=0.5,
                                     repetition_penalty=1.1)

print(f"Model Generated Output: {output}")

[All the supported prompt functions]

inplace_alpaca_style()
inplace_alpaca_v2_style()
inplace_chatml_style()
inplace_ouroboros_style()
inplace_mixtral_style()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ouroboros_hf_text_generation-1.0.0.tar.gz (18.5 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file ouroboros_hf_text_generation-1.0.0.tar.gz.

File metadata

File hashes

Hashes for ouroboros_hf_text_generation-1.0.0.tar.gz
Algorithm Hash digest
SHA256 f443e8214e2f9a79beb02c8d909bf891a9d07bb0bd073f7424c7580439913a93
MD5 ab369fe728fb13a64feaa1cd2f27afa8
BLAKE2b-256 3dc1216660627f54a2041142e264ae7dfa851867518e874fbf6f6bc07ad98f03

See more details on using hashes here.

File details

Details for the file ouroboros_hf_text_generation-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ouroboros_hf_text_generation-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c996c1079aa50ba9ad6cbff61e36b4c0376b3707cc2dfc035d5c7b51d2e29548
MD5 eefae54d8c3b178fa13c52ea77659f38
BLAKE2b-256 59e2ed0fa4cc29ed185638b0f065929cf50134f19653d50504fa5a3c050469fe

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page