ouroboros-hf-text-generation

Simple hugging face text generation utility package with multiple prompt format support and history.

Project description

Unofficial Hugging Face Text Generation Utility Package.

[OUROBOROS VERSION 0.0.1]

Install ouroboros-hf-text-gen-utils

Simple installation from PyPI

pip install ouroboros_hf_text_gen_utils

Other installation options

Install from github directly

pip install git+https://github.com/VINUK0/Ouroboros-HF-TXT-GEN-UTILS.git

[Supported Features]

✅ ONNX model inferece support via hugging face.

✅ OPENVINO model inferece support via hugging face.

✅ NF8 (8bit) and NF4 (4bit) inferece support via hugging face.

✅ Better Transformers inferece support via hugging face.

⚠️ Hugging face is slowly dropping better transformers support since (scaled_dot_product_attention) is being added to pytorch nativly.

✅ Flash Attention 2 inferece support via hugging face

⚠️ Your GPU architecture need to be Ampere or higher for flash attention 2 to work. (First Ampere based GPU is RTX 3080 as said by google bard.)

[Coming In The Future]

🚧 Save and Load Conversations.

🚧 Export the conversation's to human readable multiple prompt formats to create datasets.

🚧 Print total params in the model (I am busy to do it right now.)

[Supported Prompting Format's]

✅ Alpaca Version 1

Backend Style

You are an helpful ai assistant.

### Instruction:
What is a computer.

### Response:
A computer is a programmable machine that manipulates information: it takes data, processes it, and stores it. Think of it as a powerful calculator that can follow instructions to do almost anything!

✅ Alpaca Version 2 (Self Proclaimed)

Backend Style

### Instruction:
You are an helpful ai assistant.

### Input:
What is a computer.

### Response:
A computer is a programmable machine that manipulates information: it takes data, processes it, and stores it. Think of it as a powerful calculator that can follow instructions to do almost anything!

✅ ChatML

Backend Style

<|im_start|>system:
You are an helpful ai assistant.<|im_end|>

<|im_start|>user:
What is a computer.<|im_end|>

<|im_start|>assistant:
A computer is a programmable machine that manipulates information: it takes data, processes it, and stores it. Think of it as a powerful calculator that can follow instructions to do almost anything!<|im_end|>

✅ Ouroboros (Same as TinyLlama 1B format)

Backend Style

<|system|>
You are an helpful ai assistant.

<|user>
What is a computer.

<|model|>
A computer is a programmable machine that manipulates information: it takes data, processes it, and stores it. Think of it as a powerful calculator that can follow instructions to do almost anything!

✅ Mixtral (This format is confusing to me.)

Backend Style

[INST] What is a computer.

[/INST] A computer is a programmable machine that manipulates information: it takes data, processes it, and stores it. Think of it as a powerful calculator that can follow instructions to do almost anything!

[How to basic]

⚠️ Available dtype's are F32, F16, BF16.

⚠️ Use can run these using this too accelerate launch your_file_name.py

⚠️ max_sys_prompt_length, max_prompt_length and max_hist_length is mesured by letters not tokens.

🛠️ Simple Inferece

Code example

from ouroboros_text_gen_utils import text_generation

api = text_generation(tokenizer_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
                      model_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
                      dtype="BF16")

history = []

system_prompt = """### Instruction:
You are an helpful ai assistant."""

history, output = api.inplace_alpaca_v2_style(history=history,
                                     system_prompt=system_prompt,
                                     prompt="What is a computer.",
                                     user_name="user",
                                     character_name="assistant",
                                     max_sys_prompt_length=2048,
                                     max_prompt_length=1024,
                                     max_hist_length=8096,
                                     max_new_tokens=256,
                                     min_new_tokens=10,
                                     top_p=0.8,
                                     top_k=50,
                                     temperature=0.5,
                                     repetition_penalty=1.1)

print(f"Model Generated Output: {output}")

🛠️ 8 bit and 4 bit inferece.

⚠️ When loading models in 8 bit or 4 bit make sure the dtype is either F16 or BF16.

Code example

from ouroboros_text_gen_utils import text_generation

api = text_generation(tokenizer_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
                      model_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
                      dtype="BF16", load_in_4bit=True)

# load_in_8bit=True can be also used.

history = []

system_prompt = """### Instruction:
You are an helpful ai assistant."""

history, output = api.inplace_alpaca_v2_style(history=history,
                                     system_prompt=system_prompt,
                                     prompt="What is a computer.",
                                     user_name="user",
                                     character_name="assistant",
                                     max_sys_prompt_length=2048,
                                     max_prompt_length=1024,
                                     max_hist_length=8096,
                                     max_new_tokens=256,
                                     min_new_tokens=10,
                                     top_p=0.8,
                                     top_k=50,
                                     temperature=0.5,
                                     repetition_penalty=1.1)

print(f"Model Generated Output: {output}")

🛠️ Flash Attention 1.

⚠️ Use can use this with load_in_4bit or load_in_8bit too.

⚠️ I have found a error when trying to run the model with flash attention 1 on a T4 GPU in BF16. If that happens use F16 instead.

Code example

from ouroboros_text_gen_utils import text_generation

api = text_generation(tokenizer_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
                      model_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
                      dtype="BF16", flash_attention="flash_attention_1")

history = []

system_prompt = """### Instruction:
You are an helpful ai assistant."""

history, output = api.inplace_alpaca_v2_style(history=history,
                                     system_prompt=system_prompt,
                                     prompt="What is a computer.",
                                     user_name="user",
                                     character_name="assistant",
                                     max_sys_prompt_length=2048,
                                     max_prompt_length=1024,
                                     max_hist_length=8096,
                                     max_new_tokens=256,
                                     min_new_tokens=10,
                                     top_p=0.8,
                                     top_k=50,
                                     temperature=0.5,
                                     repetition_penalty=1.1)

print(f"Model Generated Output: {output}")

🛠️ Flash Attention 2.

⚠️ Use can use this with load_in_4bit or load_in_8bit too.

⚠️ You need this package to run FA2 pip install flash-attn --no-build-isolation

Code example

from ouroboros_text_gen_utils import text_generation

api = text_generation(tokenizer_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
                      model_name="Doctor-Shotgun/TinyLlama-1.1B-32k-Instruct",
                      dtype="BF16", flash_attention="flash_attention_2")

history = []

system_prompt = """### Instruction:
You are an helpful ai assistant."""

history, output = api.inplace_alpaca_v2_style(history=history,
                                     system_prompt=system_prompt,
                                     prompt="What is a computer.",
                                     user_name="user",
                                     character_name="assistant",
                                     max_sys_prompt_length=2048,
                                     max_prompt_length=1024,
                                     max_hist_length=8096,
                                     max_new_tokens=256,
                                     min_new_tokens=10,
                                     top_p=0.8,
                                     top_k=50,
                                     temperature=0.5,
                                     repetition_penalty=1.1)

print(f"Model Generated Output: {output}")

🛠️ Onnx model.

⚠️ If you have a GPU change the onnx_execution_provider to onnx_execution_provider="CUDAExecutionProvider".

Code example

from ouroboros_text_gen_utils import text_generation

api = text_generation(tokenizer_name="example/onnx_model_1B",
                      model_name="example/onnx_model_1B",
                      onnx_model=True, onnx_execution_provider="CPUExecutionProvider")

history = []

system_prompt = """### Instruction:
You are an helpful ai assistant."""

history, output = api.inplace_alpaca_v2_style(history=history,
                                     system_prompt=system_prompt,
                                     prompt="What is a computer.",
                                     user_name="user",
                                     character_name="assistant",
                                     max_sys_prompt_length=2048,
                                     max_prompt_length=1024,
                                     max_hist_length=8096,
                                     max_new_tokens=256,
                                     min_new_tokens=10,
                                     top_p=0.8,
                                     top_k=50,
                                     temperature=0.5,
                                     repetition_penalty=1.1)

print(f"Model Generated Output: {output}")

🛠️ OpenVino model.

⚠️ If you have a GPU avaliable it will try to run from it.

Code example

from ouroboros_text_gen_utils import text_generation

api = text_generation(tokenizer_name="example/openvino_model_1B",
                      model_name="example/openvino_model_1B",
                      openvino_model=True)

history = []

system_prompt = """### Instruction:
You are an helpful ai assistant."""

history, output = api.inplace_alpaca_v2_style(history=history,
                                     system_prompt=system_prompt,
                                     prompt="What is a computer.",
                                     user_name="user",
                                     character_name="assistant",
                                     max_sys_prompt_length=2048,
                                     max_prompt_length=1024,
                                     max_hist_length=8096,
                                     max_new_tokens=256,
                                     min_new_tokens=10,
                                     top_p=0.8,
                                     top_k=50,
                                     temperature=0.5,
                                     repetition_penalty=1.1)

print(f"Model Generated Output: {output}")

🛠️ Better Transformer.

Code example

from ouroboros_text_gen_utils import text_generation

api = text_generation(tokenizer_name="example/openvino_model_1B",
                      model_name="example/openvino_model_1B",
                      better_transformers=True, dtype="BF16")

history = []

system_prompt = """### Instruction:
You are an helpful ai assistant."""

history, output = api.inplace_alpaca_v2_style(history=history,
                                     system_prompt=system_prompt,
                                     prompt="What is a computer.",
                                     user_name="user",
                                     character_name="assistant",
                                     max_sys_prompt_length=2048,
                                     max_prompt_length=1024,
                                     max_hist_length=8096,
                                     max_new_tokens=256,
                                     min_new_tokens=10,
                                     top_p=0.8,
                                     top_k=50,
                                     temperature=0.5,
                                     repetition_penalty=1.1)

print(f"Model Generated Output: {output}")

[All the supported prompt functions]

inplace_alpaca_style()
inplace_alpaca_v2_style()
inplace_chatml_style()
inplace_ouroboros_style()
inplace_mixtral_style()

Project details

Release history Release notifications | RSS feed

This version

1.0.2

Feb 25, 2024

1.0.1

Feb 23, 2024

1.0.0

Jan 29, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ouroboros_hf_text_generation-1.0.2.tar.gz (18.6 kB view details)

Uploaded Feb 25, 2024 Source

Built Distribution

ouroboros_hf_text_generation-1.0.2-py3-none-any.whl (15.7 kB view details)

Uploaded Feb 25, 2024 Python 3

File details

Details for the file ouroboros_hf_text_generation-1.0.2.tar.gz.

File metadata

Download URL: ouroboros_hf_text_generation-1.0.2.tar.gz
Upload date: Feb 25, 2024
Size: 18.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for ouroboros_hf_text_generation-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`11433d14a459dec802634082c9df4b1a1755629e1d7b489df4abd7b394e44bee`
MD5	`e30e47f97ada403c8ef07101a59e8fd0`
BLAKE2b-256	`cc5a288075033607304d6a41cc3673814db08ad3f89b866301d0fdb9cf4c6b62`

See more details on using hashes here.

File details

Details for the file ouroboros_hf_text_generation-1.0.2-py3-none-any.whl.

File metadata

Download URL: ouroboros_hf_text_generation-1.0.2-py3-none-any.whl
Upload date: Feb 25, 2024
Size: 15.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for ouroboros_hf_text_generation-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ba71b358dd52053bf8fc197fafe2fb8f9b586de90e54eeb49d1fcda1357388e9`
MD5	`f21f4e99a8a891e5e40a6e4f698eb2e9`
BLAKE2b-256	`27bc402eb04bf5f3449f81ab7cbf4cc51c3b73c2fcc19a9904de6c3d8971f2ed`