No project description provided
Project description
engawa
NOT YET FULLY TESTED
A simple implementation to pre-train BART from scratch with your own corpus.
Usage
Soon, I will make this pip-installable with CLI commands but at the moment, you need to run it as a repository.
Installation
pip install engawa
Build tokenizer
engawa train-tokenizer --data-path /path/to/train.txt --save-dir /path/to/save
# Checkout other options by
engawa train-tokenizer --help
Pre-train BART
engawa train-model --tokenizer-file /path/to/tokenizer.json --train-file /path/to/train.txt --val-file /path/to/val.txt --default-root-dir /path/to/save/things
# Checkout other options by
engawa train-model --help
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
engawa-0.1.4.tar.gz
(8.2 kB
view hashes)
Built Distribution
engawa-0.1.4-py3-none-any.whl
(9.6 kB
view hashes)