Train and entropy-prune ARPA n-gram language models
Project description
kaldingram
kaldingram provides Python and CLI tools to:
- train Kneser-Ney back-off n-gram language models in ARPA format
- entropy-prune ARPA language models
The implementation is based on Kaldi WSJ scripts and matches SRILM-style behavior.
Install
pip install kaldingram
CLI Usage
Train an n-gram LM
kaldingram train --ngram-order 4 --text corpus.txt --lm 4gram.arpa
Or stream text from stdin and write ARPA to stdout:
cat corpus.txt | kaldingram train --ngram-order 3 > 3gram.arpa
Prune an n-gram LM
kaldingram prune --threshold 1e-8 --lm 4gram.arpa --write-lm 4gram_pruned.arpa
Development
Build package locally:
python -m pip install --upgrade build
python -m build
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
kaldingram-0.1.0.tar.gz
(12.8 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kaldingram-0.1.0.tar.gz.
File metadata
- Download URL: kaldingram-0.1.0.tar.gz
- Upload date:
- Size: 12.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd8c357249c4faf28ce083b69335836ec87ef6d11cbab4fc17545a4c8156651a
|
|
| MD5 |
bf4e62343233276593b793e21c639082
|
|
| BLAKE2b-256 |
3f300014baa4a9e01f14afe25439743d3f75b47d858cfc80e283380cfd58989e
|
File details
Details for the file kaldingram-0.1.0-py3-none-any.whl.
File metadata
- Download URL: kaldingram-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d6997b41075155ec2d1b38a20a4369c44c7ce9dc4c0ab2489f65bab656bc95b6
|
|
| MD5 |
785b10b09a6baede52fae6b5864bae35
|
|
| BLAKE2b-256 |
4a3e030621ae607ea92c981ba5b749db7c78d2f81009b45ab4ed26dcea95cc78
|