No project description provided
Project description
engawa
NOT YET FULLY TESTED
A simple implementation to pre-train BART from scratch with your own corpus.
Usage
Soon, I will make this pip-installable with CLI commands but at the moment, you need to run it as a repository.
Installation
pip install engawa
Build tokenizer
engawa train-tokenizer --data-path /path/to/train.txt --save-dir /path/to/save
# Checkout other options by
engawa train-tokenizer --help
Pre-train BART
engawa train-model \
--tokenizer-file /path/to/tokenizer.json \
--train-file /path/to/train.txt \
--val-file /path/to/val.txt \
--default-root-dir /path/to/save/things
# Checkout other options by
engawa train-model --help
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
engawa-0.1.5.tar.gz
(9.3 kB
view hashes)
Built Distribution
engawa-0.1.5-py3-none-any.whl
(11.2 kB
view hashes)