Skip to main content

Train and entropy-prune ARPA n-gram language models

Project description

kaldingram

kaldingram provides Python and CLI tools to:

  • train Kneser-Ney back-off n-gram language models in ARPA format
  • entropy-prune ARPA language models

The implementation is based on Kaldi WSJ scripts and matches SRILM-style behavior.

Install

pip install kaldingram

CLI Usage

Train an n-gram LM

kaldingram train --ngram-order 4 --text corpus.txt --lm 4gram.arpa

Or stream text from stdin and write ARPA to stdout:

cat corpus.txt | kaldingram train --ngram-order 3 > 3gram.arpa

Prune an n-gram LM

kaldingram prune --threshold 1e-8 --lm 4gram.arpa --write-lm 4gram_pruned.arpa

Development

Build package locally:

python -m pip install --upgrade build
python -m build

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kaldingram-0.1.0.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kaldingram-0.1.0-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file kaldingram-0.1.0.tar.gz.

File metadata

  • Download URL: kaldingram-0.1.0.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kaldingram-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fd8c357249c4faf28ce083b69335836ec87ef6d11cbab4fc17545a4c8156651a
MD5 bf4e62343233276593b793e21c639082
BLAKE2b-256 3f300014baa4a9e01f14afe25439743d3f75b47d858cfc80e283380cfd58989e

See more details on using hashes here.

File details

Details for the file kaldingram-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: kaldingram-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kaldingram-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d6997b41075155ec2d1b38a20a4369c44c7ce9dc4c0ab2489f65bab656bc95b6
MD5 785b10b09a6baede52fae6b5864bae35
BLAKE2b-256 4a3e030621ae607ea92c981ba5b749db7c78d2f81009b45ab4ed26dcea95cc78

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page