Skip to main content

Type-hinted interface to use several decoders on text-generation models

Project description

Decoder_Ring

pip install decoder-ring

from decoder_ring import ContrastiveSearch

Concept

The fluency and usefulness of text generation models depends on the decoder used to select tokens from probabilities and build the text output.

Two examples: greedy decoding always selects the most probable token; random sampling considers all possible tokens with their given probability.

The goal of decoder_ring is a common API with type hints, helpful error messages and logs, parameter restrictions, encouragement of random seeds, etc. to make text decoding clear and reproducible. In the future this should support many more decoder types.

Documentation

I would like to expand on the documentation in all of the decoder options, links to relevant papers etc., to make this library and the overall decoder concept accessible to new users.

Supported methods

  • ContrastiveSearch (params: random_seed, penalty_alpha, top_k)
  • GreedyDecoder
  • RandomSampling (params: random_seed)
  • TypicalDecoder (params: random_seed, typical_p)

Writer Examples (text input and output)

from decoder import BasicWriter, RandomSampling

basic = BasicWriter('gpt2', RandomSampling)
writer_output = basic.write_text(
    prompt="Hello, my name is", max_length=20, early_stopping=True
)

Decoder Examples (with customization)

Start with a HuggingFace Transformers / PyTorch model and tokenized text:

model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
content = tokenizer.encode("Hello, my name is", return_tensors="pt")

Example with Transformers' default greedy decoder:

decoder1 = GreedyDecoder(model)
greedy_output = decoder1.generate_text(
    prompt=content, max_length=20, early_stopping=True
)
tokenizer.decode(greedy_output[0], skip_special_tokens=True)

Example with typical decoding, which will require a random_seed before generating text, and a typical_p between 0 and 1:

decoder3 = TypicalDecoder(model, random_seed=603, typical_p=0.4)
typical_output = decoder3.generate_text(
    prompt=content, max_length=20, early_stopping=True
)

# new random seed
decoder3.set_random_seed(101)
typical_output_2 = decoder3.generate_text(
    prompt=content, max_length=20, early_stopping=True
)

License

Apache license for compatibility with the Transformers library

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

decoder_ring-0.1.2.tar.gz (13.4 kB view details)

Uploaded Source

Built Distribution

decoder_ring-0.1.2-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file decoder_ring-0.1.2.tar.gz.

File metadata

  • Download URL: decoder_ring-0.1.2.tar.gz
  • Upload date:
  • Size: 13.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for decoder_ring-0.1.2.tar.gz
Algorithm Hash digest
SHA256 b866c484822d3061f834f061297f7aee90844d02e31d78a3bd5fb16e6b42bc32
MD5 8f116c7a87d1470224289c8a101b195c
BLAKE2b-256 28bdb39caea4d04a101611d6122c1dfdd62d742373780b5cade2528aacc8a816

See more details on using hashes here.

File details

Details for the file decoder_ring-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: decoder_ring-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 12.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for decoder_ring-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0f69d05bb73ba18a593b3b5bc683aeab55a764a9191a1123e68959bdc20356c6
MD5 67ce98bee14e1b28adac1e50262391aa
BLAKE2b-256 f2794856e0eadec1e87414d6695d900c5e5513d8893f1216aa12534d4b0c6d40

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page