Simple hugging face text generation utility package with multiple prompt format support and history.
Project description
Unofficial Hugging Face Text Generation Utility Package.
[OUROBOROS VERSION 0.0.1]
Install ouroboros-hf-text-gen-utils
Simple installation from PyPI
pip install ouroboros_hf_text_gen_utils
Other installation options
Install from github directly
pip install git+https://github.com/VINUK0/Ouroboros-HF-TXT-GEN-UTILS.git
[Supported Features]
✅ ONNX model inferece support via hugging face.
✅ OPENVINO model inferece support via hugging face.
✅ NF8 (8bit) and NF4 (4bit) inferece support via hugging face.
✅ Better Transformers inferece support via hugging face.
⚠️ Hugging face is slowly dropping better transformers support since (scaled_dot_product_attention) is being added to pytorch nativly.
✅ Flash Attention 2 inferece support via hugging face
⚠️ Your GPU architecture need to be Ampere or higher for flash attention 2 to work. (First Ampere based GPU is RTX 3080 as said by google bard.)
[Coming In The Future]
🚧 Save and Load Conversations.
🚧 Export the conversation's to human readable multiple prompt formats to create datasets.
🚧 Print total params in the model (I am busy to do it right now.)
[Supported Prompting Format's]
✅ Alpaca Version 1
Backend Style
You are an helpful ai assistant.
### Instruction:
What is a computer.
### Response:
A computer is a programmable machine that manipulates information: it takes data, processes it, and stores it. Think of it as a powerful calculator that can follow instructions to do almost anything!
✅ Alpaca Version 2 (Self Proclaimed)
Backend Style
### Instruction:
You are an helpful ai assistant.
### Input:
What is a computer.
### Response:
A computer is a programmable machine that manipulates information: it takes data, processes it, and stores it. Think of it as a powerful calculator that can follow instructions to do almost anything!
✅ ChatML
Backend Style
<|im_start|>system:
You are an helpful ai assistant.<|im_end|>
<|im_start|>user:
What is a computer.<|im_end|>
<|im_start|>assistant:
A computer is a programmable machine that manipulates information: it takes data, processes it, and stores it. Think of it as a powerful calculator that can follow instructions to do almost anything!<|im_end|>
✅ Ouroboros (Same as TinyLlama 1B format)
Backend Style
<|system|>
You are an helpful ai assistant.
<|user>
What is a computer.
<|model|>
A computer is a programmable machine that manipulates information: it takes data, processes it, and stores it. Think of it as a powerful calculator that can follow instructions to do almost anything!
✅ Mixtral (This format is confusing to me.)
Backend Style
[INST] What is a computer.
[/INST] A computer is a programmable machine that manipulates information: it takes data, processes it, and stores it. Think of it as a powerful calculator that can follow instructions to do almost anything!
[How to basic]
⚠️ Available dtype's are F32
, F16
, BF16
.
⚠️ Use can run these using this too accelerate launch your_file_name.py
⚠️ max_sys_prompt_length
, max_prompt_length
and max_hist_length
is mesured by letters not tokens.
🛠️ Simple Inferece
Code example
from ouroboros_text_gen_utils import text_generation
api = text_generation(tokenizer_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
model_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
dtype="BF16")
history = []
system_prompt = """### Instruction:
You are an helpful ai assistant."""
history, output = api.inplace_alpaca_v2_style(history=history,
system_prompt=system_prompt,
prompt="What is a computer.",
user_name="user",
character_name="assistant",
max_sys_prompt_length=2048,
max_prompt_length=1024,
max_hist_length=8096,
max_new_tokens=256,
min_new_tokens=10,
top_p=0.8,
top_k=50,
temperature=0.5,
repetition_penalty=1.1)
print(f"Model Generated Output: {output}")
🛠️ 8 bit and 4 bit inferece.
⚠️ When loading models in 8 bit or 4 bit make sure the dtype is either F16
or BF16
.
Code example
from ouroboros_text_gen_utils import text_generation
api = text_generation(tokenizer_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
model_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
dtype="BF16", load_in_4bit=True)
# load_in_8bit=True can be also used.
history = []
system_prompt = """### Instruction:
You are an helpful ai assistant."""
history, output = api.inplace_alpaca_v2_style(history=history,
system_prompt=system_prompt,
prompt="What is a computer.",
user_name="user",
character_name="assistant",
max_sys_prompt_length=2048,
max_prompt_length=1024,
max_hist_length=8096,
max_new_tokens=256,
min_new_tokens=10,
top_p=0.8,
top_k=50,
temperature=0.5,
repetition_penalty=1.1)
print(f"Model Generated Output: {output}")
🛠️ Flash Attention 1.
⚠️ Use can use this with load_in_4bit
or load_in_8bit
too.
⚠️ I have found a error when trying to run the model with flash attention 1 on a T4 GPU in BF16
. If that happens use F16
instead.
Code example
from ouroboros_text_gen_utils import text_generation
api = text_generation(tokenizer_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
model_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
dtype="BF16", flash_attention="flash_attention_1")
history = []
system_prompt = """### Instruction:
You are an helpful ai assistant."""
history, output = api.inplace_alpaca_v2_style(history=history,
system_prompt=system_prompt,
prompt="What is a computer.",
user_name="user",
character_name="assistant",
max_sys_prompt_length=2048,
max_prompt_length=1024,
max_hist_length=8096,
max_new_tokens=256,
min_new_tokens=10,
top_p=0.8,
top_k=50,
temperature=0.5,
repetition_penalty=1.1)
print(f"Model Generated Output: {output}")
🛠️ Flash Attention 2.
⚠️ Use can use this with load_in_4bit
or load_in_8bit
too.
⚠️ You need this package to run FA2 pip install flash-attn --no-build-isolation
Code example
from ouroboros_text_gen_utils import text_generation
api = text_generation(tokenizer_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
model_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
dtype="BF16", flash_attention="flash_attention_2")
history = []
system_prompt = """### Instruction:
You are an helpful ai assistant."""
history, output = api.inplace_alpaca_v2_style(history=history,
system_prompt=system_prompt,
prompt="What is a computer.",
user_name="user",
character_name="assistant",
max_sys_prompt_length=2048,
max_prompt_length=1024,
max_hist_length=8096,
max_new_tokens=256,
min_new_tokens=10,
top_p=0.8,
top_k=50,
temperature=0.5,
repetition_penalty=1.1)
print(f"Model Generated Output: {output}")
🛠️ Onnx model.
⚠️ If you have a GPU change the onnx_execution_provider to onnx_execution_provider="CUDAExecutionProvider"
.
Code example
from ouroboros_text_gen_utils import text_generation
api = text_generation(tokenizer_name="example/onnx_model_1B",
model_name="example/onnx_model_1B",
onnx_model=True, onnx_execution_provider="CPUExecutionProvider")
history = []
system_prompt = """### Instruction:
You are an helpful ai assistant."""
history, output = api.inplace_alpaca_v2_style(history=history,
system_prompt=system_prompt,
prompt="What is a computer.",
user_name="user",
character_name="assistant",
max_sys_prompt_length=2048,
max_prompt_length=1024,
max_hist_length=8096,
max_new_tokens=256,
min_new_tokens=10,
top_p=0.8,
top_k=50,
temperature=0.5,
repetition_penalty=1.1)
print(f"Model Generated Output: {output}")
🛠️ OpenVino model.
⚠️ If you have a GPU avaliable it will try to run from it.
Code example
from ouroboros_text_gen_utils import text_generation
api = text_generation(tokenizer_name="example/openvino_model_1B",
model_name="example/openvino_model_1B",
openvino_model=True)
history = []
system_prompt = """### Instruction:
You are an helpful ai assistant."""
history, output = api.inplace_alpaca_v2_style(history=history,
system_prompt=system_prompt,
prompt="What is a computer.",
user_name="user",
character_name="assistant",
max_sys_prompt_length=2048,
max_prompt_length=1024,
max_hist_length=8096,
max_new_tokens=256,
min_new_tokens=10,
top_p=0.8,
top_k=50,
temperature=0.5,
repetition_penalty=1.1)
print(f"Model Generated Output: {output}")
🛠️ Better Transformer.
Code example
from ouroboros_text_gen_utils import text_generation
api = text_generation(tokenizer_name="example/openvino_model_1B",
model_name="example/openvino_model_1B",
better_transformers=True, dtype="BF16")
history = []
system_prompt = """### Instruction:
You are an helpful ai assistant."""
history, output = api.inplace_alpaca_v2_style(history=history,
system_prompt=system_prompt,
prompt="What is a computer.",
user_name="user",
character_name="assistant",
max_sys_prompt_length=2048,
max_prompt_length=1024,
max_hist_length=8096,
max_new_tokens=256,
min_new_tokens=10,
top_p=0.8,
top_k=50,
temperature=0.5,
repetition_penalty=1.1)
print(f"Model Generated Output: {output}")
[All the supported prompt functions]
inplace_alpaca_style()
inplace_alpaca_v2_style()
inplace_chatml_style()
inplace_ouroboros_style()
inplace_mixtral_style()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ouroboros_hf_text_generation-1.0.2.tar.gz
.
File metadata
- Download URL: ouroboros_hf_text_generation-1.0.2.tar.gz
- Upload date:
- Size: 18.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11433d14a459dec802634082c9df4b1a1755629e1d7b489df4abd7b394e44bee |
|
MD5 | e30e47f97ada403c8ef07101a59e8fd0 |
|
BLAKE2b-256 | cc5a288075033607304d6a41cc3673814db08ad3f89b866301d0fdb9cf4c6b62 |
File details
Details for the file ouroboros_hf_text_generation-1.0.2-py3-none-any.whl
.
File metadata
- Download URL: ouroboros_hf_text_generation-1.0.2-py3-none-any.whl
- Upload date:
- Size: 15.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ba71b358dd52053bf8fc197fafe2fb8f9b586de90e54eeb49d1fcda1357388e9 |
|
MD5 | f21f4e99a8a891e5e40a6e4f698eb2e9 |
|
BLAKE2b-256 | 27bc402eb04bf5f3449f81ab7cbf4cc51c3b73c2fcc19a9904de6c3d8971f2ed |