No project description provided
Project description
engawa
NOT YET FULLY TESTED
A simple implementation to pre-train BART from scratch with your own corpus.
Usage
Soon, I will make this pip-installable with CLI commands but at the moment, you need to run it as a repository.
Installation
pip install engawa
Build tokenizer
engawa train-tokenizer --data-path /path/to/train.txt --save-dir /path/to/save
# Checkout other options by
engawa train-tokenizer --help
Pre-train BART
engawa train-model \
--tokenizer-file /path/to/tokenizer.json \
--train-file /path/to/train.txt \
--val-file /path/to/val.txt \
--default-root-dir /path/to/save/things
# Checkout other options by
engawa train-model --help
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
engawa-0.1.5.tar.gz
(9.3 kB
view details)
Built Distribution
engawa-0.1.5-py3-none-any.whl
(11.2 kB
view details)
File details
Details for the file engawa-0.1.5.tar.gz
.
File metadata
- Download URL: engawa-0.1.5.tar.gz
- Upload date:
- Size: 9.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.2 CPython/3.9.5 Darwin/22.4.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c5b4565c2dde1bab84df4c969ffbc7bd9d49f262e2abd64e6007850434bdb35e |
|
MD5 | 06fec4a15e10fea6a9436592a219ec6b |
|
BLAKE2b-256 | 1366561d61bf059ae1824ae7ff3b2a57d46fafdfe70ee185276b2696ff3a65fe |
File details
Details for the file engawa-0.1.5-py3-none-any.whl
.
File metadata
- Download URL: engawa-0.1.5-py3-none-any.whl
- Upload date:
- Size: 11.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.2 CPython/3.9.5 Darwin/22.4.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fb8abe4f1d8e6464af4cd34d6c12a5f412eff79827b9c1bdae2b804a88446eba |
|
MD5 | 4ea1e7374a365486e61892e3c83679df |
|
BLAKE2b-256 | c8c852280e5c933cde34b319bd23735bc745ca6f91b00acdc35f657d55568032 |