Skip to main content

Generation Stopping Criteria for transformers Language Model

Project description

gstop

PyPI version Python Versions

gstop is a Python library that provides generation stopping criteria for Transformers-based language models. It allows you to define custom stop tokens and criteria to control the generation process and prevent the model from generating unwanted or irrelevant content.

Features

  • Define custom stop tokens and criteria for language model generation
  • Supports various pre-defined stop token registries for popular language models
  • Easy integration with the Transformers library
  • Flexible and extensible architecture for adding new stop token registries

Installation

You can install gstop using pip:

pip install gstop

Usage

Here's a basic example of how to use gstop with the Transformers library:

from gstop import GenerationStopper, STOP_TOKENS_REGISTRY
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "mistralai/Mistral-7B-v0.1"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
stopper = GenerationStopper(STOP_TOKENS_REGISTRY["mistral"])

input_ids = tokenizer("Hello, world!", return_tensors="pt").input_ids

out = model.generate(input_ids, stopping_criteria=stopper.criteria)
print(stopper.format(tokenizer.decode(out[0])))

In this example, we create an instance of GenerationStopper using the pre-defined stop tokens registry for the "mistral" model. We then use the generate method of the language model to generate text, passing the stopping_criteria parameter with the stopper's criteria. Finally, we format the generated text using the format method of the stopper to remove any stop tokens.

Customization

You can customize the stop tokens and criteria by creating your own stop token registry or by modifying the existing ones. The stop token registries are defined in the common.py file.

To create a new stop token registry, you can add an entry to the STOP_TOKENS_REGISTRY dictionary with the desired stop tokens and their corresponding token IDs.

Contributing

Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gstop-0.2.4.tar.gz (3.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gstop-0.2.4-py3-none-any.whl (4.0 kB view details)

Uploaded Python 3

File details

Details for the file gstop-0.2.4.tar.gz.

File metadata

  • Download URL: gstop-0.2.4.tar.gz
  • Upload date:
  • Size: 3.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for gstop-0.2.4.tar.gz
Algorithm Hash digest
SHA256 df557c37e9918ea28325356b9e732fe3b1ad6053c5614a90c1e93832ececf167
MD5 d81f349963dfc6cb9e88ba9ce5b68d57
BLAKE2b-256 92208e49d0814a5286094863b7dbf48e835ebee93ee2e6499c932b6d2f227bc4

See more details on using hashes here.

File details

Details for the file gstop-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: gstop-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 4.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for gstop-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e23b59c925506e419f7a9cc0054596831205993c399881abba52cf873ecb4f8f
MD5 d9a96395ebc8225be7153046d7550461
BLAKE2b-256 35adb5a8ee46d518c2cf9a20cc06d74262e41b0549ad85ad14b6c5ee9f4d41ea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page