No project description provided
Project description
arpa2fst
Python wrapper for kaldi's arpa2fst.
Installation
kaldilm
can be installed using either conda
or pip
.
Using conda
conda install -c k2-fsa -c conda-forge kaldilm
Using pip
pip install kaldilm
In case it doesn't work using pip install
(you can't import _kaldilm
), something
likely failed during the compilation of the native part of this library.
The following steps will show you a more verbose log that can help diagnose the issue:
# Remove the broken version first
pip uninstall kaldilm
pip install -v --no-cache-dir kaldilm
To test that kaldilm
is installed successfully, run:
$ python3 -m kaldilm --help
It should display the usage information of kaldilm
.
Please create an issue on GitHub
if you encounter any problems while installing kaldilm
.
Usage
First, let us see the usage information of kaldi's arpa2fst:
kaldi/src/lmbin$ ./arpa2fst
./arpa2fst
Convert an ARPA format language model into an FST
Usage: arpa2fst [opts] <input-arpa> <output-fst>
e.g.: arpa2fst --disambig-symbol=#0 --read-symbol-table=data/lang/words.txt lm/input.arpa G.fst
Note: When called without switches, the output G.fst will contain
an embedded symbol table. This is compatible with the way a previous
version of arpa2fst worked.
Options:
--bos-symbol : Beginning of sentence symbol (string, default = "<s>")
--disambig-symbol : Disambiguator. If provided (e. g. #0), used on input side of backoff links, and <s> and </s> are replaced with epsilons (string, default = "")
--eos-symbol : End of sentence symbol (string, default = "</s>")
--ilabel-sort : Ilabel-sort the output FST (bool, default = true)
--keep-symbols : Store symbol table with FST. Symbols always saved to FST if symbol tables are neither read or written
(otherwise symbols would be lost entirely) (bool, default = false)
--max-arpa-warnings : Maximum warnings to report on ARPA parsing, 0 to disable, -1 to show all (int, default = 30)
--read-symbol-table : Use existing symbol table (string, default = "")
--write-symbol-table : Write generated symbol table to a file (string, default = "")
kaldilm
uses the same arguments as kaldi's arpa2fst:
$ python3 -m kaldilm --help
prints
usage: Python wrapper of kaldi's arpa2fst [-h] [--bos-symbol BOS_SYMBOL]
[--disambig-symbol DISAMBIG_SYMBOL]
[--eos-symbol EOS_SYMBOL]
[--ilabel-sort ILABEL_SORT]
[--keep-symbols KEEP_SYMBOLS]
[--max-arpa-warnings MAX_ARPA_WARNINGS]
[--read-symbol-table READ_SYMBOL_TABLE]
[--write-symbol-table WRITE_SYMBOL_TABLE]
[--max-order MAX_ORDER]
input_arpa [output_fst]
positional arguments:
input_arpa input arpa filename
output_fst Output fst filename. If empty, no output file is
created.
optional arguments:
-h, --help show this help message and exit
--bos-symbol BOS_SYMBOL
Beginning of sentence symbol (default = "<s>")
--disambig-symbol DISAMBIG_SYMBOL
Disambiguator. If provided (e.g., #0), used on input
side of backoff links, and <s> and </s> are replaced
with epsilons (default = "")
--eos-symbol EOS_SYMBOL
End of sentence symbol (default = "</s>")
--ilabel-sort ILABEL_SORT
Ilabel-sort the output FST (default = true)
--keep-symbols KEEP_SYMBOLS
Store symbol table with FST. Symbols always saved to
FST if symboltables are neither read or written
(otherwise symbols would be lost entirely) (default =
false)
--max-arpa-warnings MAX_ARPA_WARNINGS
Maximum warnings to report on ARPA parsing, 0 to
disable, -1 to show all (default = 30)
--read-symbol-table READ_SYMBOL_TABLE
Use existing symbol table (default = "")
--write-symbol-table WRITE_SYMBOL_TABLE
(Write generated symbol table to a file (default = "")
--max-order MAX_ORDER
Maximum order (inclusive) in the arpa file is used to
generate the final FST. If it is -1, all ngram data in
the file are used.If it is 1, only unigram data are
used.If it is 2, only ngram data up to bigram are
used.Default is -1.
It has one extra argument --max-order
, which is not present in kaldi's arpa2fst.
Example usage
Suppose you have an arpa file input.arpa
with the following content:
\data\
ngram 1=4
ngram 2=2
ngram 3=2
\1-grams:
-5.234679 a -3.3
-3.456783 b
0.0000000 <s> -2.5
-4.333333 </s>
\2-grams:
-1.45678 a b -3.23
-1.30490 <s> a -4.2
\3-grams:
-0.34958 <s> a b
-0.23940 a b </s>
\end\
and the word symbol table is words.txt
:
<eps> 0
a 1
b 2
#0 3
<s> 4
</s> 5
Note: Numbers in the arpa file are log10(p)
, while numbers on arcs
in OpenFst are -log(p)
and it is log(p)
in k2
.
log(10) = 2.3026
log10(p) | p | log(p) | note |
---|---|---|---|
-5.234679 | 0.000006 | -12.053294 | log(p) = log10(p) * log(10), -12.053294 = 2.3026 * (-5.234679) |
-3.300000 | 0.000501 | -7.598531 | |
-3.456783 | 0.000349 | -7.959537 | |
0.000000 | 1.000000 | 0.000000 | |
-2.500000 | 0.003162 | -5.756463 | |
-4.333333 | 0.000046 | -9.977868 | |
-1.456780 | 0.034932 | -3.354360 | |
-3.230000 | 0.000589 | -7.437350 | |
-1.304900 | 0.049556 | -3.004643 | |
-4.200000 | 0.000063 | -9.670856 | |
-0.349580 | 0.447116 | -0.804938 | |
-0.239400 | 0.576235 | -0.551239 |
Caution: All symbols with ID >= the ID of #0 are set to <eps>
during compiling HLG.
See https://github.com/k2-fsa/icefall/blob/243fb9723cb82287ec5a891155ab9e0bc304740d/egs/librispeech/ASR/local/compile_hlg.py#L103
If IDs of <s>
and </s>
are less than that of #0
, the resulting HLG is problematic.
You can use the following code to convert it into an FST.
3-gram
This uses all n-gram data inside the arpa file.
python3 -m kaldilm \
--read-symbol-table="./words.txt" \
--disambig-symbol='#0' \
./input.arpa > G_fst.txt
The resulting G_fst.txt
is shown in the following
3 5 1 1 3.00464
3 0 3 0 5.75646
0 1 1 1 12.0533
0 2 2 2 7.95954
0 9.97787
1 4 2 2 3.35436
1 0 3 0 7.59853
2 0 3 0
4 2 3 0 7.43735
4 0.551239
5 4 2 2 0.804938
5 1 3 0 9.67086
which can be visualized in k2 using
import k2
with open('G_fst.txt') as f:
G = k2.Fsa.from_openfst(f.read(), acceptor=False)
G.labels_sym = k2.SymbolTable.from_file('words.txt')
G.aux_labels_sym = k2.SymbolTable.from_file('words.txt')
#G.labels[G.labels >= 3] = 0 # convert symbols with ID >= ID of #0 to eps
G.draw('G.svg', title='G')
G.svg
is shown below:
1-gram
It uses only uni-gram data inside the arpa file
since --max-order=1
is used.
python3 -m kaldilm \
--read-symbol-table="./words.txt" \
--disambig-symbol='#0' \
--max-order=1 \
./input.arpa > G_uni_fst.txt
The generated G_uni_fst.txt
is
3 0 3 0 5.75646
0 1 1 1 12.0533
0 2 2 2 7.95954
0 9.97787
1 0 3 0 7.59853
2 0 3 0
which can be visualized in k2 using
with open('G_uni_fst.txt') as f:
G = k2.Fsa.from_openfst(f.read(), acceptor=False)
G.labels_sym = k2.SymbolTable.from_file('words.txt')
G.aux_labels_sym = k2.SymbolTable.from_file('words.txt')
#G.labels[G.labels >= 3] = 0 # convert symbols with ID >= ID of #0 to eps
G.draw('G_uni.svg', title='G_uni')
G_uni.svg
is shown below:
What's more
Please refer to https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/prepare.sh
for how kaldilm
is used in icefall.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for kaldilm-1.15.1-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b79437189d330c68e40872c0ae59e00b6d885b272fd34b1bb9ed1c3ebfb7c3b8 |
|
MD5 | 6f5518158c1ba5578b0624aa08a8fe46 |
|
BLAKE2b-256 | 7b50aad74f7f44e35917feef6b986072352e19089b01012962095917c02d8079 |
Hashes for kaldilm-1.15.1-cp311-cp311-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1fce7a2744c0a9d9650c7987ff112e3393387d4026c5d8d5ced148f2a00bc382 |
|
MD5 | b11adb8c35bec5689fa534be0725ee2e |
|
BLAKE2b-256 | 97ef56b14ffea79bae705f9507d3cf488377e6a1b3d2209760d2f0a91ba61b23 |
Hashes for kaldilm-1.15.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 76adcf4b73443c032217cd13eb6c8ff135acc0d7806b20d311a83000df0392f8 |
|
MD5 | e9a7da7de9c8c5ef825b4880137b381c |
|
BLAKE2b-256 | 0a0dc9c82cb31b5860a1a22ec4821d3fdd40e5786a357a78ea176ca7c91cf4ca |
Hashes for kaldilm-1.15.1-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6917c5a03994603eecd39f590366cda4c77377fce83c3487e55e6577c90fd379 |
|
MD5 | 9349286f2484a17fc9fb43553c66706b |
|
BLAKE2b-256 | ec93dc350632c9b4d6a2245505b2f5c5ed1e72a860e34a0b3a4ce1bf6e0255b6 |
Hashes for kaldilm-1.15.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9f5587c64d4287a94912a22ac358aea624515b1af9ffaa4f41b0adb505899e16 |
|
MD5 | 2c66e22b583cbf1446dd3e5aa6c7a858 |
|
BLAKE2b-256 | c5a861d74fce0e5ba0b562f7b925153755d09d09096e6c705b5a42a164e47dc3 |
Hashes for kaldilm-1.15.1-cp311-cp311-macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ddc3a85f8bcf16803315d2788b8c44088e6d05bd56cdcef746b51ff7ba1c6a3 |
|
MD5 | 92590129337e3a83e94ef8d589ab8086 |
|
BLAKE2b-256 | 9871915cb1edee4d2ddcb30736f88081730fd6ed99036b7e56946af0521a5428 |
Hashes for kaldilm-1.15.1-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60fa399ba59284793fddc066387d235a6d87a9ba2847179b98d2d002de2ccf42 |
|
MD5 | 67c932d7b423ffeb40c8a2143be56fa6 |
|
BLAKE2b-256 | 9f075066dd70362f45e27955d7a253f93dda602413920a1b588f194cdf442614 |
Hashes for kaldilm-1.15.1-cp310-cp310-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 552be250e4866517dc4537b50a49a6c87eb61bc4e07a0e60d3f047b2d714ca52 |
|
MD5 | b9ec5c1153653043544304654e2e248f |
|
BLAKE2b-256 | e31b48ff0a311742bb12476543f9f0a0554ad7ce939ce1a0e832f4b8802a0b7a |
Hashes for kaldilm-1.15.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a73e69a661918a57d09f22de524b075ac1d57dd7e6a639d06e0df20d88d9d584 |
|
MD5 | f889a25c8d7f7a26f5b5fb97a9b7502d |
|
BLAKE2b-256 | 54c871c49d99c5e0fbb84a686d791033ec3ba9154abcc27bc0e9630a7b03e4f0 |
Hashes for kaldilm-1.15.1-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9df2ecb6021191e0bc0ffa63c65884dbf64e19072a765d6ef48864844fb6ed45 |
|
MD5 | ef42c80ec32904d8b74e6ef20867531b |
|
BLAKE2b-256 | bb5ed7d31a5db81732ad29c01a737f67b279ce17be5e1668b4b7351eca0d9772 |
Hashes for kaldilm-1.15.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 49291901b43201a9c394faf1ac3b994c2b37fd4ba9853d4f242a76960cbfc348 |
|
MD5 | 3d9fbf129c6ec8414ac63dde5cf37d3d |
|
BLAKE2b-256 | c6e83a47f6e9ad0bd33caa39c419bed5dcb40df48d72a8bee610be3d0a938da3 |
Hashes for kaldilm-1.15.1-cp310-cp310-macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 465995ef34badbe24c138374f6a750bd3667797b0d94c5f24db56e06d2de2050 |
|
MD5 | 79dd92826a233612c78d1682daf7bc58 |
|
BLAKE2b-256 | 0519819c446db30b2d2782f180312cf5395b7520fce31182a248c753367018d7 |
Hashes for kaldilm-1.15.1-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fb53f3d67b3ed66f3f9d9866096b797908f1a18b012dc6a0d68168639427d3c6 |
|
MD5 | 05bef1a43951722c573e855a358844a0 |
|
BLAKE2b-256 | 7960e4b1f854e86109041deb191c92318ffcc30bee2f292666296f51e559b569 |
Hashes for kaldilm-1.15.1-cp39-cp39-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb97d5effeadf68045bade66007088a1faf5ac96bc7c64cd0301c5cdd5495046 |
|
MD5 | 3dea0d41f425bcf9da9e29c781838f2b |
|
BLAKE2b-256 | ea0ebcf9f04e506211e2fb74ad50172fb9280f5280dbf2feea309d0df62a4039 |
Hashes for kaldilm-1.15.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79caa7838a9191ef86b45eb096c5e977dbe253479c22bd6be0fb73c56077d9df |
|
MD5 | 44cf2e11ad7e894cc3833e26f9f2eec5 |
|
BLAKE2b-256 | 208436f4a3f3d6b59961230bc73fe12b4e7069f5c4743071964489b8ca9ec8ea |
Hashes for kaldilm-1.15.1-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0bcdd89b2b2513bc1edb0d3d5dbe00d78f3ca000292e87a15cb881343fb91549 |
|
MD5 | 841b629a85f88c0df5ad6923f08357c8 |
|
BLAKE2b-256 | 0abeaf83eb1dda5340ef450853a6fefcdc2f99bcb8cddb98c7231195fe4ed640 |
Hashes for kaldilm-1.15.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f530a688c46d6c248747a602135cfe85ca1c06afcc0bf47c4b3f97ecbc3210d4 |
|
MD5 | 592269aa2d1944a03e257bd075640d7c |
|
BLAKE2b-256 | 9d2741b0c6c3f2adba0d592e9e42d7ad914e53a2f7b4dcc93aeffd54cd01d4fc |
Hashes for kaldilm-1.15.1-cp39-cp39-macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 98f2b0db38d593003508d79a9034c5b15e0c8b2a3a272d6a5f97870e81ba24fe |
|
MD5 | 5500f426858a72d9690a2fe9066d8540 |
|
BLAKE2b-256 | 6096e074e8693ec51058e9230ee6c0a2f179465d7a62fcdf63d3fd26b2bb78e7 |
Hashes for kaldilm-1.15.1-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a6af5c425b2ccd489a6ab66ad8a26902d7d7441e67a2c0c551c90603d33b59ce |
|
MD5 | 210408b4a0fe465d803a583f804eec69 |
|
BLAKE2b-256 | cffab0d5c0c86c3610036cef62278254123a8f57e0ccb4a4efa8fe767a37dce6 |
Hashes for kaldilm-1.15.1-cp38-cp38-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d71505c07fd2818cd472e0576e3e2d9624b73216ece64bb93ae84e25e780257d |
|
MD5 | 45c559b16e0b167b2efdf71bb96d802e |
|
BLAKE2b-256 | 0684dd2a78924c01d3f1ffdc62e3a3ccacf2396e6dfea8bfbe820340238d7995 |
Hashes for kaldilm-1.15.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4072bfe910371bc1c7249ac7ef6cbdf06b5dfcfc53e6f614568ce7758edaea69 |
|
MD5 | dc31a4e8d104a4e47f456ceb3b40983d |
|
BLAKE2b-256 | 50580c909e2fbea88e5e8de774533e373456287450f822cdbc9fa4fc1f98addd |
Hashes for kaldilm-1.15.1-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e76828ebd633cc494a02f78b19ee4068fca3fb3732919a99393d831e33495b4 |
|
MD5 | acff77f130b2d4042310d466549e67ec |
|
BLAKE2b-256 | d4d627e28320acdc01ffdffa64922187dd54c8d351651c670f3105ee554d08ed |
Hashes for kaldilm-1.15.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 012e5381b2c5964879aa468c6f5bbed8816db3716021c98c4c4009a99c0c8296 |
|
MD5 | b54a1d7257d93e523eb523400ae0d08a |
|
BLAKE2b-256 | 6a82b1e42c232a75772f9b946a5ac3bc6fee9268692ec766c2a36462b0b16802 |
Hashes for kaldilm-1.15.1-cp38-cp38-macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 991a9451e3eaef78d44e06ccfc7b2901bf4f8e05d0fe2dbfbcf03a2546f167b8 |
|
MD5 | c8dae71d9302a4c7cbaffd5075a877bb |
|
BLAKE2b-256 | 4b57f38df7f0edaf0c83d65e683852a3fd529747e72f21177de6ae95aa45f92c |
Hashes for kaldilm-1.15.1-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 936c4f2d08e4e5a686c1f5f102e158a6dea64385225629820ec58a458d79fd28 |
|
MD5 | 0f911704b1fd11af657a49d885e07cea |
|
BLAKE2b-256 | 399b78ecfe472b41abec7f415eada0267f50628aa65fa7cc7542671986a69317 |
Hashes for kaldilm-1.15.1-cp37-cp37m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7e2ebc58e7c22b62b73cab82c96aed46b9ba16d337ab405029fe4dc5948801f7 |
|
MD5 | f911edc6da54ccd6b5705541338ad000 |
|
BLAKE2b-256 | 5f0445884efdc63281d91c3cb2aedfbc8090323f7a7845706e9badce4f88abb2 |
Hashes for kaldilm-1.15.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6be903bc66e8825877634ca60c8eb69b8a0e534d59724558d21298d3dfb1de9e |
|
MD5 | 9cdf5b1f215c6dbd91d2fea361c050e2 |
|
BLAKE2b-256 | 2cf23cb60d8782221a9fa058345ebc09377aa9dfadd2c8aef10ef9fcc8ccb2cf |
Hashes for kaldilm-1.15.1-cp37-cp37m-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 090294e7eece405118d10898b12dffcea1fcd743ff39ef25c2c29b12ae483d17 |
|
MD5 | d3edbecdf722025bd3219b0db1793501 |
|
BLAKE2b-256 | 062dd1e6312d3004414890ff73101420c02086475d9184e59560aba1ccabfeeb |
Hashes for kaldilm-1.15.1-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2307e8ee1eb67b8a1a6652241610f9b688fa8ae92e8f79c09692f6c53d55227a |
|
MD5 | 6715132bcdc86f2052d134180a66be7a |
|
BLAKE2b-256 | e328a231357977dca30fb9d575e52869f46b8fd5e1eceafe479cad29fe3b7543 |
Hashes for kaldilm-1.15.1-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0b4648213c71cf87ae817fc9e41df9aa947917031596baea3be5f47d2f20a77c |
|
MD5 | 7d30ba4cbf4110e0abf79cabc9733ba0 |
|
BLAKE2b-256 | eacb1a9f46b22b20d6d9242b057b0b5a5c059dd51825911c54109e307d19e2dc |
Hashes for kaldilm-1.15.1-cp36-cp36m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 33776618856ffcf2793e5adc90d3f78759c01c1997a5473826bcda94f6fbca6f |
|
MD5 | 0430646c1cded070cec185bfc52182e3 |
|
BLAKE2b-256 | 3465e7657957a507cd59f61544ea6ee3bd0c623482a92061c91d3dfea31e0415 |
Hashes for kaldilm-1.15.1-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 489a268d333206c9b7813961d5dbc44ee806808bbd631a0bd62aba0a16b4d4a3 |
|
MD5 | 9f911da8b168969f2d1df4d8811be4bf |
|
BLAKE2b-256 | c7acf369b053e8d810abc26f0b38bdc59d8d93212a35676b60f442a25cf017c7 |
Hashes for kaldilm-1.15.1-cp36-cp36m-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 254a6e03665a7c5405df03421c15bc70c0069cd74393147be8e30900b322fdae |
|
MD5 | fd4b5d0de44e4aad2efbe1a5e9296ba1 |
|
BLAKE2b-256 | 1b0de85b2051e0138bdf9514cc77ece8a433056e7c19b73f86f2e2b0f998729a |