speech agent for 100 hours
Project description
haloop
Haloop is a speech agent toolkit. Haloop provides hac
program for acoustic model training, hal
for RNN language model training and evaluation and hat
for attention decoder LM. The package is available on PyPI:
pip install haloop
Currently, hat
is a REPL for Ukrainian GPT-2 models from the paper GPT-2 Metadata Pretraining Towards Instruction Finetuning for Ukrainian.
To use hat
, install some additional dependencies and models:
pip install bitsandbytes sentencepiece
wget https://a.wilab.org.ua/gpt/wiki.model # sentencepiece tokenizer
wget https://a.wilab.org.ua/gpt/ckpt10m.pt # model checkpoint for GPT-2 Large
Now, kick start the REPL:
hat --spm wiki.model ckpt10m.pt
Please cite:
@inproceedings{kyrylov-chaplynskyi-2023-gpt,
title = "{GPT}-2 Metadata Pretraining Towards Instruction Finetuning for {U}krainian",
author = "Kyrylov, Volodymyr and
Chaplynskyi, Dmytro",
booktitle = "Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)",
month = may,
year = "2023",
address = "Dubrovnik, Croatia",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.unlp-1.4",
pages = "32--39",
abstract = "We explore pretraining unidirectional language models on 4B tokens from the largest curated corpus of Ukrainian, UberText 2.0. We enrich document text by surrounding it with weakly structured metadata, such as title, tags, and publication year, enabling metadata-conditioned text generation and text-conditioned metadata prediction at the same time. We pretrain GPT-2 Small, Medium and Large models each on single GPU, reporting training times, BPC on BrUK and BERTScore on titles for 1000 News from the Future. Next, we venture to formatting POS and NER datasets as instructions, and train low-rank attention adapters, performing these tasks as constrained text generation. We release our models for the community at https://github.com/proger/uk4b.",
}
See also Speech Discrimination by Dynamic Programming, T. K. Vintsyuk (1968)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file haloop-0.0.8.tar.gz
.
File metadata
- Download URL: haloop-0.0.8.tar.gz
- Upload date:
- Size: 41.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fff1054fc415b4eaf299ab6ebcfbcf714d6e27cde6d1c3fb3160934d32f57b1d |
|
MD5 | 6f6c977034bda3b5806bf049614cca35 |
|
BLAKE2b-256 | 707e86b3039a321cd844b12e05a205bb4e7faf435027d6cdbee87ce9c24ff48e |
File details
Details for the file haloop-0.0.8-py3-none-any.whl
.
File metadata
- Download URL: haloop-0.0.8-py3-none-any.whl
- Upload date:
- Size: 46.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fec4e647bde762c37921a8229b1afe9c2793ead077e1732137462e63caf1426f |
|
MD5 | 44f02379f827a982ac3ac64e0740baeb |
|
BLAKE2b-256 | 1ea18af231b3c24e3025d6e775ec144ccf1cabc1405542c0f1dbd21913844740 |