Skip to main content

speech agent for 100 hours

Project description

haloop

PyPI Version

Haloop is a speech agent toolkit. Haloop provides hac program for acoustic model training, hal for RNN language model training and evaluation and hat for attention decoder LM. The package is available on PyPI:

pip install haloop

Currently, hat is a REPL for Ukrainian GPT-2 models from the paper GPT-2 Metadata Pretraining Towards Instruction Finetuning for Ukrainian.

To use hat, install some additional dependencies and models:

pip install bitsandbytes sentencepiece

wget https://a.wilab.org.ua/gpt/wiki.model  # sentencepiece tokenizer
wget https://a.wilab.org.ua/gpt/ckpt10m.pt  # model checkpoint for GPT-2 Large

Now, kick start the REPL:

hat --spm wiki.model ckpt10m.pt

Please cite:

@inproceedings{kyrylov-chaplynskyi-2023-gpt,
    title = "{GPT}-2 Metadata Pretraining Towards Instruction Finetuning for {U}krainian",
    author = "Kyrylov, Volodymyr  and
      Chaplynskyi, Dmytro",
    booktitle = "Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)",
    month = may,
    year = "2023",
    address = "Dubrovnik, Croatia",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.unlp-1.4",
    pages = "32--39",
    abstract = "We explore pretraining unidirectional language models on 4B tokens from the largest curated corpus of Ukrainian, UberText 2.0. We enrich document text by surrounding it with weakly structured metadata, such as title, tags, and publication year, enabling metadata-conditioned text generation and text-conditioned metadata prediction at the same time. We pretrain GPT-2 Small, Medium and Large models each on single GPU, reporting training times, BPC on BrUK and BERTScore on titles for 1000 News from the Future. Next, we venture to formatting POS and NER datasets as instructions, and train low-rank attention adapters, performing these tasks as constrained text generation. We release our models for the community at https://github.com/proger/uk4b.",
}

See also Speech Discrimination by Dynamic Programming, T. K. Vintsyuk (1968)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

haloop-0.0.7.tar.gz (38.3 kB view details)

Uploaded Source

Built Distribution

haloop-0.0.7-py3-none-any.whl (43.8 kB view details)

Uploaded Python 3

File details

Details for the file haloop-0.0.7.tar.gz.

File metadata

  • Download URL: haloop-0.0.7.tar.gz
  • Upload date:
  • Size: 38.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.3

File hashes

Hashes for haloop-0.0.7.tar.gz
Algorithm Hash digest
SHA256 40d114d10ec79af18fe543a0b0d8feb069a0ef398f3497dadbeb29e68c8a677d
MD5 726de8db0074cc2db9d65d35985e9b43
BLAKE2b-256 5b61e8b84f2b21b31c2e0700514c296f15c84c0c316f62b5629943d02c255f9f

See more details on using hashes here.

File details

Details for the file haloop-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: haloop-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 43.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.3

File hashes

Hashes for haloop-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 ee1e33973f92c5d7bdf3c30104ebf1337228416a2f40d0a7e0d75b52ea96c75e
MD5 afa6f545e31b870dddb49e9f3bb6ba9f
BLAKE2b-256 4085bcc98ca0f6697ed864303d27da8812639761303b8074b42494550634cc67

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page