Skip to main content

Simple hugging face text generation utility package with multiple prompt format support and history.

Project description

ouroboros

Unofficial Hugging Face Text Generation Utility Package.

[OUROBOROS VERSION 0.0.1]

Install ouroboros-hf-text-gen-utils

Simple installation from PyPI

pip install ouroboros_hf_text_gen_utils
Other installation options

Install from github directly

pip install git+https://github.com/VINUK0/Ouroboros-HF-TXT-GEN-UTILS.git

[Supported Features]

ONNX model inferece support via hugging face.

OPENVINO model inferece support via hugging face.

NF8 (8bit) and NF4 (4bit) inferece support via hugging face.

Better Transformers inferece support via hugging face.

⚠️ Hugging face is slowly dropping better transformers support since (scaled_dot_product_attention) is being added to pytorch nativly.

Flash Attention 2 inferece support via hugging face

⚠️ Your GPU architecture need to be Ampere or higher for flash attention 2 to work. (First Ampere based GPU is RTX 3080 as said by google bard.)


[Coming In The Future]

🚧 Save and Load Conversations.

🚧 Export the conversation's to human readable multiple prompt formats to create datasets.

🚧 Print total params in the model (I am busy to do it right now.)


[Supported Prompting Format's]

Alpaca Version 1

Backend Style
You are an helpful ai assistant.

### Instruction:
What is a computer.

### Response:
A computer is a programmable machine that manipulates information: it takes data, processes it, and stores it. Think of it as a powerful calculator that can follow instructions to do almost anything!

Alpaca Version 2 (Self Proclaimed)

Backend Style
### Instruction:
You are an helpful ai assistant.

### Input:
What is a computer.

### Response:
A computer is a programmable machine that manipulates information: it takes data, processes it, and stores it. Think of it as a powerful calculator that can follow instructions to do almost anything!

ChatML

Backend Style
<|im_start|>system:
You are an helpful ai assistant.<|im_end|>

<|im_start|>user:
What is a computer.<|im_end|>

<|im_start|>assistant:
A computer is a programmable machine that manipulates information: it takes data, processes it, and stores it. Think of it as a powerful calculator that can follow instructions to do almost anything!<|im_end|>

Ouroboros (Same as TinyLlama 1B format)

Backend Style
<|system|>
You are an helpful ai assistant.

<|user>
What is a computer.

<|model|>
A computer is a programmable machine that manipulates information: it takes data, processes it, and stores it. Think of it as a powerful calculator that can follow instructions to do almost anything!

Mixtral (This format is confusing to me.)

Backend Style
[INST] What is a computer.

[/INST] A computer is a programmable machine that manipulates information: it takes data, processes it, and stores it. Think of it as a powerful calculator that can follow instructions to do almost anything!

[How to basic]

⚠️ Available dtype's are F32, F16, BF16.

⚠️ Use can run these using this too accelerate launch your_file_name.py

⚠️ max_sys_prompt_length, max_prompt_length and max_hist_length is mesured by letters not tokens.

🛠️ Simple Inferece

Code example
from ouroboros_text_gen_utils import text_generation

api = text_generation(tokenizer_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
                      model_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
                      dtype="BF16")

history = []

system_prompt = """### Instruction:
You are an helpful ai assistant."""

history, output = api.inplace_alpaca_v2_style(history=history,
                                     system_prompt=system_prompt,
                                     prompt="What is a computer.",
                                     user_name="user",
                                     character_name="assistant",
                                     max_sys_prompt_length=2048,
                                     max_prompt_length=1024,
                                     max_hist_length=8096,
                                     max_new_tokens=256,
                                     min_new_tokens=10,
                                     top_p=0.8,
                                     top_k=50,
                                     temperature=0.5,
                                     repetition_penalty=1.1)

print(f"Model Generated Output: {output}")

🛠️ 8 bit and 4 bit inferece.

⚠️ When loading models in 8 bit or 4 bit make sure the dtype is either F16 or BF16.

Code example
from ouroboros_text_gen_utils import text_generation

api = text_generation(tokenizer_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
                      model_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
                      dtype="BF16", load_in_4bit=True)

# load_in_8bit=True can be also used.

history = []

system_prompt = """### Instruction:
You are an helpful ai assistant."""

history, output = api.inplace_alpaca_v2_style(history=history,
                                     system_prompt=system_prompt,
                                     prompt="What is a computer.",
                                     user_name="user",
                                     character_name="assistant",
                                     max_sys_prompt_length=2048,
                                     max_prompt_length=1024,
                                     max_hist_length=8096,
                                     max_new_tokens=256,
                                     min_new_tokens=10,
                                     top_p=0.8,
                                     top_k=50,
                                     temperature=0.5,
                                     repetition_penalty=1.1)

print(f"Model Generated Output: {output}")

🛠️ Flash Attention 1.

⚠️ Use can use this with load_in_4bit or load_in_8bit too.

⚠️ I have found a error when trying to run the model with flash attention 1 on a T4 GPU in BF16. If that happens use F16 instead.

Code example
from ouroboros_text_gen_utils import text_generation

api = text_generation(tokenizer_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
                      model_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
                      dtype="BF16", flash_attention="flash_attention_1")

history = []

system_prompt = """### Instruction:
You are an helpful ai assistant."""

history, output = api.inplace_alpaca_v2_style(history=history,
                                     system_prompt=system_prompt,
                                     prompt="What is a computer.",
                                     user_name="user",
                                     character_name="assistant",
                                     max_sys_prompt_length=2048,
                                     max_prompt_length=1024,
                                     max_hist_length=8096,
                                     max_new_tokens=256,
                                     min_new_tokens=10,
                                     top_p=0.8,
                                     top_k=50,
                                     temperature=0.5,
                                     repetition_penalty=1.1)

print(f"Model Generated Output: {output}")

🛠️ Flash Attention 2.

⚠️ Use can use this with load_in_4bit or load_in_8bit too.

⚠️ You need this package to run FA2 pip install flash-attn --no-build-isolation

Code example
from ouroboros_text_gen_utils import text_generation

api = text_generation(tokenizer_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
                      model_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
                      dtype="BF16", flash_attention="flash_attention_2")

history = []

system_prompt = """### Instruction:
You are an helpful ai assistant."""

history, output = api.inplace_alpaca_v2_style(history=history,
                                     system_prompt=system_prompt,
                                     prompt="What is a computer.",
                                     user_name="user",
                                     character_name="assistant",
                                     max_sys_prompt_length=2048,
                                     max_prompt_length=1024,
                                     max_hist_length=8096,
                                     max_new_tokens=256,
                                     min_new_tokens=10,
                                     top_p=0.8,
                                     top_k=50,
                                     temperature=0.5,
                                     repetition_penalty=1.1)

print(f"Model Generated Output: {output}")

🛠️ Onnx model.

⚠️ If you have a GPU change the onnx_execution_provider to onnx_execution_provider="CUDAExecutionProvider".

Code example
from ouroboros_text_gen_utils import text_generation

api = text_generation(tokenizer_name="example/onnx_model_1B",
                      model_name="example/onnx_model_1B",
                      onnx_model=True, onnx_execution_provider="CPUExecutionProvider")

history = []

system_prompt = """### Instruction:
You are an helpful ai assistant."""

history, output = api.inplace_alpaca_v2_style(history=history,
                                     system_prompt=system_prompt,
                                     prompt="What is a computer.",
                                     user_name="user",
                                     character_name="assistant",
                                     max_sys_prompt_length=2048,
                                     max_prompt_length=1024,
                                     max_hist_length=8096,
                                     max_new_tokens=256,
                                     min_new_tokens=10,
                                     top_p=0.8,
                                     top_k=50,
                                     temperature=0.5,
                                     repetition_penalty=1.1)

print(f"Model Generated Output: {output}")

🛠️ OpenVino model.

⚠️ If you have a GPU avaliable it will try to run from it.

Code example
from ouroboros_text_gen_utils import text_generation

api = text_generation(tokenizer_name="example/openvino_model_1B",
                      model_name="example/openvino_model_1B",
                      openvino_model=True)

history = []

system_prompt = """### Instruction:
You are an helpful ai assistant."""

history, output = api.inplace_alpaca_v2_style(history=history,
                                     system_prompt=system_prompt,
                                     prompt="What is a computer.",
                                     user_name="user",
                                     character_name="assistant",
                                     max_sys_prompt_length=2048,
                                     max_prompt_length=1024,
                                     max_hist_length=8096,
                                     max_new_tokens=256,
                                     min_new_tokens=10,
                                     top_p=0.8,
                                     top_k=50,
                                     temperature=0.5,
                                     repetition_penalty=1.1)

print(f"Model Generated Output: {output}")

🛠️ Better Transformer.

Code example
from ouroboros_text_gen_utils import text_generation

api = text_generation(tokenizer_name="example/openvino_model_1B",
                      model_name="example/openvino_model_1B",
                      better_transformers=True, dtype="BF16")

history = []

system_prompt = """### Instruction:
You are an helpful ai assistant."""

history, output = api.inplace_alpaca_v2_style(history=history,
                                     system_prompt=system_prompt,
                                     prompt="What is a computer.",
                                     user_name="user",
                                     character_name="assistant",
                                     max_sys_prompt_length=2048,
                                     max_prompt_length=1024,
                                     max_hist_length=8096,
                                     max_new_tokens=256,
                                     min_new_tokens=10,
                                     top_p=0.8,
                                     top_k=50,
                                     temperature=0.5,
                                     repetition_penalty=1.1)

print(f"Model Generated Output: {output}")

[All the supported prompt functions]

inplace_alpaca_style()
inplace_alpaca_v2_style()
inplace_chatml_style()
inplace_ouroboros_style()
inplace_mixtral_style()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ouroboros_hf_text_generation-1.0.2.tar.gz (18.6 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file ouroboros_hf_text_generation-1.0.2.tar.gz.

File metadata

File hashes

Hashes for ouroboros_hf_text_generation-1.0.2.tar.gz
Algorithm Hash digest
SHA256 11433d14a459dec802634082c9df4b1a1755629e1d7b489df4abd7b394e44bee
MD5 e30e47f97ada403c8ef07101a59e8fd0
BLAKE2b-256 cc5a288075033607304d6a41cc3673814db08ad3f89b866301d0fdb9cf4c6b62

See more details on using hashes here.

File details

Details for the file ouroboros_hf_text_generation-1.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for ouroboros_hf_text_generation-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ba71b358dd52053bf8fc197fafe2fb8f9b586de90e54eeb49d1fcda1357388e9
MD5 f21f4e99a8a891e5e40a6e4f698eb2e9
BLAKE2b-256 27bc402eb04bf5f3449f81ab7cbf4cc51c3b73c2fcc19a9904de6c3d8971f2ed

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page