Skip to main content

sequence and joint-sequence modelling tool for g2p

Project description

Build Status Sequitur G2P

A trainable Grapheme-to-Phoneme converter.

Introduction

Sequitur G2P is a data-driven grapheme-to-phoneme converter written at RWTH Aachen University by Maximilian Bisani.

The method used in this software is described in

   M. Bisani and H. Ney: "Joint-Sequence Models for Grapheme-to-Phoneme
   Conversion". Speech Communication, Volume 50, Issue 5, May 2008,
   Pages 434-451

   (available online at http://dx.doi.org/10.1016/j.specom.2008.01.002)

This software is made available to you under terms of the GNU Public License. It can be used for experimentation and as part of other free software projects. For details see the licensing terms below.

If you publish about work that involves the use of this software, please cite the above paper. (You should feel obliged to do so by rules of good scientific conduct.)

The original README contains also these lines: You may contact the author with any questions or comments via e-mail: maximilian.bisani@rwth-aachen.de. For questions regarding current releases of Sequitur G2P contact Pavel Golik (golik@cs.rwth-aachen.de). but we are not sure how active they are. If needed, feel free to create an issue on https://github.com/sequitur-g2p/sequitur-g2p. We will try to help.

License

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License Version 2 (June 1991) as published by the Free Software Foundation.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, you will find it at http://www.gnu.org/licenses/gpl.html, or write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110, USA.

Should a provision of no. 9 and 10 of the GNU General Public License be invalid or become invalid, a valid provision is deemed to have been agreed upon which comes closest to what the parties intended commercially. In any case guarantee/warranty shall be limited to gross negligent actions or intended actions or fraudulent concealment.

Installing

To build and use this software you need to have the following part installed:

To install change to the source directory and type: python setup.py install --prefix /usr/local You may substitute /usr/local with some other directory. If you do so make sure that some-other-directory/lib/python2.5/site-packages/ is in your PYTHONPATH, e.g. by typing export PYTHONPATH=some-other-directory/lib/python2.7/site-packages

You can also install via pip by pointing it at this repository. You still need SWIG and a C++ compiler.

pip install numpy
pip install git+https://github.com/sequitur-g2p/sequitur-g2p@master

Note, when installing on MacOS, you might run into issues due to the default libc being clang's one. If that is the case, try installing it with:

CPPFLAGS="-stdlib=libstdc++" pip install git+https://github.com/sequitur-g2p/sequitur-g2p@master

Using

Sequitur G2P is a data-driven grapheme-to-phoneme converter. Actually, it can be applied to any monotonous sequence translation problem, provided the source and target alphabets are small (less than 255 symbols). Data-driven means that you need to train it with example pronunciations. It has no built-in linguistic knowledge whatsoever, which means that it should work for any alphabetic language. Training takes a pronunciation dictionary and creates a model file. The model file can then be used to transcribe words that where not in the dictionary.

Here is step-by-step guide to get you started:

  1. Obtain a pronunciation dictionary for training. The format is one word per line. Each line contains the orthographic form of the word followed by the corresponding phonemic transcription. The word and all phonemes need to be separated by white space. The word and phoneme symbols may thus not contain blanks. We'll assume your training lexicon is called train.lex, and that you set aside some portion for testing purposes as test.lex, which is disjoint from train.lex.

  2. Train a model. To create a first model type:

    g2p.py --train train.lex --devel 5% --write-model model-1

    This first model will be rather poor because it is only a unigram. To create higher order models you need to run g2p.py again:

    g2p.py --model model-1 --ramp-up --train train.lex --devel 5% --write-model model-2

    Repeat this a couple of times

    g2p.py --model model-2 --ramp-up --train train.lex --devel 5% --write-model model-3
    g2p.py --model model-3 --ramp-up --train train.lex --devel 5% --write-model model-4
    ...
    
  3. Evaluate the model. To find out how accurately your model can transcribe unseen words type:

    g2p.py --model model-6 --test test.lex

  4. Transcribe new words. Prepare a list of words you want to transcribe as a simple text file words.txt with one word per line (and no phonemic transcription), then type:

    g2p.py --model model-3 --apply words.txt

Random comments:

  • You cannot open models created in a python3 environment inside a python2 environment. The opposite works.
  • Whenever a file name is required, you can specify "-" to mean standard in, or standard out.
  • If a file name ends in ".gz", it is assumed that the file is (or should be) compressed using gzip.
  • For the time being you need to type g2p.py --help and/or read the source to find out the other things g2p.py can do. Sorry about that.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sequitur-g2p-1.0.1668.21.tar.gz (44.9 kB view details)

Uploaded Source

Built Distributions

sequitur_g2p-1.0.1668.21-cp310-cp310-win_amd64.whl (140.5 kB view details)

Uploaded CPython 3.10 Windows x86-64

sequitur_g2p-1.0.1668.21-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.12+ x86-64

sequitur_g2p-1.0.1668.21-cp310-cp310-macosx_10_14_x86_64.whl (148.3 kB view details)

Uploaded CPython 3.10 macOS 10.14+ x86-64

sequitur_g2p-1.0.1668.21-cp39-cp39-win_amd64.whl (140.5 kB view details)

Uploaded CPython 3.9 Windows x86-64

sequitur_g2p-1.0.1668.21-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64

sequitur_g2p-1.0.1668.21-cp39-cp39-macosx_10_14_x86_64.whl (148.3 kB view details)

Uploaded CPython 3.9 macOS 10.14+ x86-64

sequitur_g2p-1.0.1668.21-cp38-cp38-win_amd64.whl (140.7 kB view details)

Uploaded CPython 3.8 Windows x86-64

sequitur_g2p-1.0.1668.21-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

sequitur_g2p-1.0.1668.21-cp38-cp38-macosx_10_14_x86_64.whl (149.0 kB view details)

Uploaded CPython 3.8 macOS 10.14+ x86-64

sequitur_g2p-1.0.1668.21-cp37-cp37m-win_amd64.whl (140.2 kB view details)

Uploaded CPython 3.7m Windows x86-64

sequitur_g2p-1.0.1668.21-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

sequitur_g2p-1.0.1668.21-cp37-cp37m-macosx_10_14_x86_64.whl (149.1 kB view details)

Uploaded CPython 3.7m macOS 10.14+ x86-64

sequitur_g2p-1.0.1668.21-cp36-cp36m-win_amd64.whl (140.2 kB view details)

Uploaded CPython 3.6m Windows x86-64

sequitur_g2p-1.0.1668.21-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

sequitur_g2p-1.0.1668.21-cp36-cp36m-macosx_10_14_x86_64.whl (149.1 kB view details)

Uploaded CPython 3.6m macOS 10.14+ x86-64

File details

Details for the file sequitur-g2p-1.0.1668.21.tar.gz.

File metadata

  • Download URL: sequitur-g2p-1.0.1668.21.tar.gz
  • Upload date:
  • Size: 44.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for sequitur-g2p-1.0.1668.21.tar.gz
Algorithm Hash digest
SHA256 9ae3e9b03cd4a071b2deef4076f4ddd16d8d4874d5fe81e2bd74fd6daf795e48
MD5 726161035741ef7fcaaa27ebb8f71f4e
BLAKE2b-256 67950aca68eef86963b2e2a5cb69ef2b67ea532914d7fe61b68fd69ebfb24e85

See more details on using hashes here.

File details

Details for the file sequitur_g2p-1.0.1668.21-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: sequitur_g2p-1.0.1668.21-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 140.5 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for sequitur_g2p-1.0.1668.21-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 3a9170a63236101e791210296fb4f1651fe3f22f8c3110432cab6a6c28905395
MD5 c482823481b165595453a9b0685c040d
BLAKE2b-256 1a7aa52b1e088159ccf8ed62963b7f64366847aa5d4eff991eae6faca263542b

See more details on using hashes here.

File details

Details for the file sequitur_g2p-1.0.1668.21-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for sequitur_g2p-1.0.1668.21-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 5df6db6f57001cbf20e1ba80511135a654d7501c61cceb733855e1fdc5105e14
MD5 4ab13110f425489becacca340c717e0c
BLAKE2b-256 f9edb82558f705a3dedfceaaad5bc5504906d436b4e637354ca4bdbcfd403b23

See more details on using hashes here.

File details

Details for the file sequitur_g2p-1.0.1668.21-cp310-cp310-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: sequitur_g2p-1.0.1668.21-cp310-cp310-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 148.3 kB
  • Tags: CPython 3.10, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for sequitur_g2p-1.0.1668.21-cp310-cp310-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 9b52924a8db255b38c6b2bbb43b277464d44db507374e01f26ab6d93bc5d50a9
MD5 2a01a492c1e0f452f0fc6f92c7ef998d
BLAKE2b-256 71c71bcabe253cc7b439c75ddfd276c52ddb7a208de4e8e571147ea622f1b06d

See more details on using hashes here.

File details

Details for the file sequitur_g2p-1.0.1668.21-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: sequitur_g2p-1.0.1668.21-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 140.5 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for sequitur_g2p-1.0.1668.21-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 5c71e0feaf7c636aae2b6dc10c3499e2f3e74f7c4ea8ea9b199537b329b49911
MD5 331eaf847507fec468742f5a17045fbd
BLAKE2b-256 c53d6384856cd1c63eaf5500a1483dc99b3941f77158e0a43bf8a8211d775703

See more details on using hashes here.

File details

Details for the file sequitur_g2p-1.0.1668.21-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for sequitur_g2p-1.0.1668.21-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 23d6d120c572f1c5935c32c6a84bf8231df6414d057b4d85177ba81a24c60490
MD5 978b71d7ea831efab8c8edec1e3c7dd8
BLAKE2b-256 5d0ae8a2c7e6a9f6343b85fcbbdf4262b7d1c964e8ade6a3444498f36c61da3d

See more details on using hashes here.

File details

Details for the file sequitur_g2p-1.0.1668.21-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: sequitur_g2p-1.0.1668.21-cp39-cp39-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 148.3 kB
  • Tags: CPython 3.9, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for sequitur_g2p-1.0.1668.21-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 d8bc824ad6ac6883824a1abb608bf89305eb5b8978822738a9b8193e25abcba5
MD5 df3912c27d4ad03ee855ef9406ab8e9f
BLAKE2b-256 7af26b4c512d665fd7825898a1fbd04b36b7cb5d7d61f438656bf41153cb0fab

See more details on using hashes here.

File details

Details for the file sequitur_g2p-1.0.1668.21-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: sequitur_g2p-1.0.1668.21-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 140.7 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for sequitur_g2p-1.0.1668.21-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 20156239108ca09411c3dd6ca49d6f7fd862580431919919fda9f5aa8d9046b1
MD5 613f23a040cdcb7bf0318cf1bc4087d0
BLAKE2b-256 5b983908872448a8d0535176cf33153e549ae6c99a622a3aaf2bce867b06d809

See more details on using hashes here.

File details

Details for the file sequitur_g2p-1.0.1668.21-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for sequitur_g2p-1.0.1668.21-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 c404ee91aac0e182e7d0576ad0a97309082157ccabeb7c2c0dc4548dec8ab635
MD5 3391c25c2611191a17522b89ef9e4029
BLAKE2b-256 df3413d9cc61add369b06a7a4fc8d9a0deb2a6f0ce4e856a3ffc43a09855464f

See more details on using hashes here.

File details

Details for the file sequitur_g2p-1.0.1668.21-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: sequitur_g2p-1.0.1668.21-cp38-cp38-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 149.0 kB
  • Tags: CPython 3.8, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for sequitur_g2p-1.0.1668.21-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 7e546318231b6c1c4668f2a2671b4c93b8f0af457ff6298086da7db3f12bcf0f
MD5 98a7c281f5c80777d7982bc6df687984
BLAKE2b-256 0ff0782db4e5db4084e43bcd6bcd9289acd0352f5a313a7ee6e2554a78363665

See more details on using hashes here.

File details

Details for the file sequitur_g2p-1.0.1668.21-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: sequitur_g2p-1.0.1668.21-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 140.2 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for sequitur_g2p-1.0.1668.21-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 d24eab48a719679d5dc5f10b61c0754903d187d80fa04b7311649e67ca6d9ffa
MD5 998f94ff71bf32409f3b7de79008452a
BLAKE2b-256 483a83068c7f66414cac6b342393bcaa297cc043bad8d08348b82e017d2069d9

See more details on using hashes here.

File details

Details for the file sequitur_g2p-1.0.1668.21-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for sequitur_g2p-1.0.1668.21-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 b8bba21d7070df8739802d34078432ab29a3be87c8e35d160667b4dfe2ef5459
MD5 a8f6364600e7d7895247d127ac90c457
BLAKE2b-256 f678dfb6c8659f70eaaa748e9c4fc77e19522627f7868b21c62aca441bee4662

See more details on using hashes here.

File details

Details for the file sequitur_g2p-1.0.1668.21-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: sequitur_g2p-1.0.1668.21-cp37-cp37m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 149.1 kB
  • Tags: CPython 3.7m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for sequitur_g2p-1.0.1668.21-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 7f6ec44485e7d5d0eb4b3cb5bad632c45252a09ffbdacf78fe465cfcd731afb3
MD5 d113a2854dd885748b32d276784d99fc
BLAKE2b-256 c974c84a0083b67817ea02640cfc5b3de96e9215e4951385b1706fa54a4963ae

See more details on using hashes here.

File details

Details for the file sequitur_g2p-1.0.1668.21-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: sequitur_g2p-1.0.1668.21-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 140.2 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for sequitur_g2p-1.0.1668.21-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 8d37adba355696eba812a873b4327c0d6fb3fe196ea91d65274fc85ee7591690
MD5 86149e11ac04f3366587181a5e857c8c
BLAKE2b-256 346e9215f2f68d41ea03cf7e22bd8cb1413835565d516cd31108f2a7e1947098

See more details on using hashes here.

File details

Details for the file sequitur_g2p-1.0.1668.21-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for sequitur_g2p-1.0.1668.21-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 4239b45f7afb322cd3f0476a25db922b717ab04e5053d492012f5271c8499f12
MD5 f7a259047c3692ab1cedce38d8b06586
BLAKE2b-256 0e48ced59d5d0ea1ebc3ff641bc3207754b96cdd1115f325aced328b18ea7311

See more details on using hashes here.

File details

Details for the file sequitur_g2p-1.0.1668.21-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: sequitur_g2p-1.0.1668.21-cp36-cp36m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 149.1 kB
  • Tags: CPython 3.6m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for sequitur_g2p-1.0.1668.21-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 5d8dfb93be37fb28a93222d8a9509db189fc71bd903a7c25e430c5f8790cf91e
MD5 f5236ab05317ba04b86e0d5419a84c83
BLAKE2b-256 d7b5e0a042181ea99cd1fec108fbca84c250a18fb1f97bc879c45aa9481a5cc9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page