Skip to main content

speech agent for 100 hours

Project description

haloop

PyPI Version

Haloop is a speech agent toolkit. Haloop provides:

  • hai program to initialize models;
  • hac program for acoustic model training;
  • hal for RNN language model training and evaluation;
  • hala for causal attention model training;
  • hat for agent testing.

The package can be installed from PyPI:

pip install haloop

Pretrained models

hat can be used with Ukrainian GPT-2 models from our paper GPT-2 Metadata Pretraining Towards Instruction Finetuning for Ukrainian.

You will need to install and download:

pip install bitsandbytes sentencepiece

wget https://a.wilab.org.ua/gpt/wiki.model  # sentencepiece tokenizer
wget https://a.wilab.org.ua/gpt/ckpt10m.pt  # model checkpoint for GPT-2 Large

Now, kick off the REPL:

hat --spm wiki.model ckpt10m.pt

Citing

Please cite:

@inproceedings{kyrylov-chaplynskyi-2023-gpt,
    title = "{GPT}-2 Metadata Pretraining Towards Instruction Finetuning for {U}krainian",
    author = "Kyrylov, Volodymyr  and
      Chaplynskyi, Dmytro",
    booktitle = "Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)",
    month = may,
    year = "2023",
    address = "Dubrovnik, Croatia",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.unlp-1.4",
    pages = "32--39",
    abstract = "We explore pretraining unidirectional language models on 4B tokens from the largest curated corpus of Ukrainian, UberText 2.0. We enrich document text by surrounding it with weakly structured metadata, such as title, tags, and publication year, enabling metadata-conditioned text generation and text-conditioned metadata prediction at the same time. We pretrain GPT-2 Small, Medium and Large models each on single GPU, reporting training times, BPC on BrUK and BERTScore on titles for 1000 News from the Future. Next, we venture to formatting POS and NER datasets as instructions, and train low-rank attention adapters, performing these tasks as constrained text generation. We release our models for the community at https://github.com/proger/uk4b.",
}

Reading

Speech Discrimination by Dynamic Programming, T. K. Vintsyuk (1968)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

haloop-0.0.10.tar.gz (49.9 kB view details)

Uploaded Source

Built Distribution

haloop-0.0.10-py3-none-any.whl (56.9 kB view details)

Uploaded Python 3

File details

Details for the file haloop-0.0.10.tar.gz.

File metadata

  • Download URL: haloop-0.0.10.tar.gz
  • Upload date:
  • Size: 49.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.4

File hashes

Hashes for haloop-0.0.10.tar.gz
Algorithm Hash digest
SHA256 c66cc237583ea1a0d08993dd6f5991d2d3b79988bb408559ec40f41636dd5915
MD5 9d5d0733a9d39b9f897be31fe2497fbf
BLAKE2b-256 1ef8a144239ec966c1f03a7b1c5fd2be613d52e07bfcb425a210490335df01bd

See more details on using hashes here.

File details

Details for the file haloop-0.0.10-py3-none-any.whl.

File metadata

  • Download URL: haloop-0.0.10-py3-none-any.whl
  • Upload date:
  • Size: 56.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.4

File hashes

Hashes for haloop-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 08f09ef152b03945329d41df48557589c4be19e77b5b70a7a3bbcee48e5c3cee
MD5 60dccddd2038cc13d985fe6bd10b69a8
BLAKE2b-256 eaa2113f3275196a6735a03453a6ce5e61e2d0f0d451dbfe661a1f3707296ed0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page