Generation Stopping Criteria for transformers Language Model
Project description
gstop
gstop is a Python library that provides generation stopping criteria for Transformers-based language models. It allows you to define custom stop tokens and criteria to control the generation process and prevent the model from generating unwanted or irrelevant content.
Features
- Define custom stop tokens and criteria for language model generation
- Supports various pre-defined stop token registries for popular language models
- Easy integration with the Transformers library
- Flexible and extensible architecture for adding new stop token registries
Installation
You can install gstop using pip:
pip install gstop
Usage
Here's a basic example of how to use gstop with the Transformers library:
from gstop import GenerationStopper, STOP_TOKENS_REGISTRY
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "mistralai/Mistral-7B-v0.1"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
stopper = GenerationStopper(STOP_TOKENS_REGISTRY["mistral"])
input_ids = tokenizer("Hello, world!", return_tensors="pt").input_ids
out = model.generate(input_ids, stopping_criteria=stopper.criteria)
print(stopper.format(tokenizer.decode(out[0])))
In this example, we create an instance of GenerationStopper using the pre-defined stop tokens registry for the "mistral" model. We then use the generate method of the language model to generate text, passing the stopping_criteria parameter with the stopper's criteria. Finally, we format the generated text using the format method of the stopper to remove any stop tokens.
Customization
You can customize the stop tokens and criteria by creating your own stop token registry or by modifying the existing ones. The stop token registries are defined in the common.py file.
To create a new stop token registry, you can add an entry to the STOP_TOKENS_REGISTRY dictionary with the desired stop tokens and their corresponding token IDs.
Contributing
Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gstop-0.2.4.tar.gz.
File metadata
- Download URL: gstop-0.2.4.tar.gz
- Upload date:
- Size: 3.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df557c37e9918ea28325356b9e732fe3b1ad6053c5614a90c1e93832ececf167
|
|
| MD5 |
d81f349963dfc6cb9e88ba9ce5b68d57
|
|
| BLAKE2b-256 |
92208e49d0814a5286094863b7dbf48e835ebee93ee2e6499c932b6d2f227bc4
|
File details
Details for the file gstop-0.2.4-py3-none-any.whl.
File metadata
- Download URL: gstop-0.2.4-py3-none-any.whl
- Upload date:
- Size: 4.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e23b59c925506e419f7a9cc0054596831205993c399881abba52cf873ecb4f8f
|
|
| MD5 |
d9a96395ebc8225be7153046d7550461
|
|
| BLAKE2b-256 |
35adb5a8ee46d518c2cf9a20cc06d74262e41b0549ad85ad14b6c5ee9f4d41ea
|