An educational module for experimenting with unsupervised learning in large language modeling
Project description
Consult the module API page at
https://engineering.purdue.edu/kak/distBabyGPT/babyGPT-1.1.1.html
for all information related to this module, including information related to the latest changes to the code. The page at the URL shown above lists all of the module functionality you can invoke in your own code.
Creating an instance of babyGPT:
baby_gpt = babyGPT(
max_seq_length = max_seq_length,
batch_size = batch_size,
embedding_size = embedding_size,
num_basic_decoders = num_basic_decoders,
num_atten_heads = num_atten_heads,
optimizer_params = optimizer_params,
num_warmup_steps = num_warmup_steps,
masking = masking,
verify_text_corpus = False,
path_saved_model = {"decoder" : "./saved_decoder",
"embedding_generator" : "./saved_embedding_generator",
},
)
Since babyGPT calls on TransformerFG for language modeling, you must also construct an instance of that class:
xformer = baby_gpt.TransformerFG(
max_seq_length = max_seq_length,
embedding_size = embedding_size,
tokenizer_json = tokenizer_json,
num_warmup_steps = num_warmup_steps,
optimizer_params = optimizer_params,
)
Within the TransformerFG module, it is the MasterDecoder class that is needed for the next token prediction for the purpose of self-supervised learning:
master_decoder = baby_gpt.MasterDecoderWithMasking(
xformer,
num_basic_decoders = num_basic_decoders,
num_atten_heads = num_atten_heads,
masking = masking
)
Finally, here is an instance of the dataloader you're going to need:
dataloader = baby_gpt.ArticleDatasetWithBufferedContext(
gpt = baby_gpt,
tokenizer_json = tokenizer_json,
context_window_size = context_window_size,
context_buffer_size = context_buffer_size,
articles_dir = articles_dir,
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
babyGPT-1.1.1.tar.gz
(648.5 kB
view details)
File details
Details for the file babyGPT-1.1.1.tar.gz.
File metadata
- Download URL: babyGPT-1.1.1.tar.gz
- Upload date:
- Size: 648.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5999ffd913373165cd9dfffaf489954bb12c7f3afef78a5a99401e6a5a56d392
|
|
| MD5 |
65a0c815882f083bb96ff8753866f171
|
|
| BLAKE2b-256 |
88cae66dd8ff13e41dea2f5ded0e0e3831a2e4b762f5c54f55586c66ff3ad608
|