Skip to main content

sequence and joint-sequence modelling tool for g2p

Project description

Build Status Sequitur G2P

A trainable Grapheme-to-Phoneme converter.

Introduction

Sequitur G2P is a data-driven grapheme-to-phoneme converter written at RWTH Aachen University by Maximilian Bisani.

The method used in this software is described in

   M. Bisani and H. Ney: "Joint-Sequence Models for Grapheme-to-Phoneme
   Conversion". Speech Communication, Volume 50, Issue 5, May 2008,
   Pages 434-451

   (available online at http://dx.doi.org/10.1016/j.specom.2008.01.002)

This software is made available to you under terms of the GNU Public License. It can be used for experimentation and as part of other free software projects. For details see the licensing terms below.

If you publish about work that involves the use of this software, please cite the above paper. (You should feel obliged to do so by rules of good scientific conduct.)

The original README contains also these lines: You may contact the author with any questions or comments via e-mail: maximilian.bisani@rwth-aachen.de. For questions regarding current releases of Sequitur G2P contact Pavel Golik (golik@cs.rwth-aachen.de). but we are not sure how active they are. If needed, feel free to create an issue on https://github.com/sequitur-g2p/sequitur-g2p. We will try to help.

License

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License Version 2 (June 1991) as published by the Free Software Foundation.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, you will find it at http://www.gnu.org/licenses/gpl.html, or write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110, USA.

Should a provision of no. 9 and 10 of the GNU General Public License be invalid or become invalid, a valid provision is deemed to have been agreed upon which comes closest to what the parties intended commercially. In any case guarantee/warranty shall be limited to gross negligent actions or intended actions or fraudulent concealment.

Installing

To build and use this software you need to have the following part installed:

To install change to the source directory and type: python setup.py install --prefix /usr/local You may substitute /usr/local with some other directory. If you do so make sure that some-other-directory/lib/python2.5/site-packages/ is in your PYTHONPATH, e.g. by typing export PYTHONPATH=some-other-directory/lib/python2.7/site-packages

You can also install via pip by pointing it at this repository. You still need SWIG and a C++ compiler.

pip install numpy
pip install git+https://github.com/sequitur-g2p/sequitur-g2p@master

Note, when installing on MacOS, you might run into issues due to the default libc being clang's one. If that is the case, try installing it with:

CPPFLAGS="-stdlib=libstdc++" pip install git+https://github.com/sequitur-g2p/sequitur-g2p@master

Using

Sequitur G2P is a data-driven grapheme-to-phoneme converter. Actually, it can be applied to any monotonous sequence translation problem, provided the source and target alphabets are small (less than 255 symbols). Data-driven means that you need to train it with example pronunciations. It has no built-in linguistic knowledge whatsoever, which means that it should work for any alphabetic language. Training takes a pronunciation dictionary and creates a model file. The model file can then be used to transcribe words that where not in the dictionary.

Here is step-by-step guide to get you started:

  1. Obtain a pronunciation dictionary for training. The format is one word per line. Each line contains the orthographic form of the word followed by the corresponding phonemic transcription. The word and all phonemes need to be separated by white space. The word and phoneme symbols may thus not contain blanks. We'll assume your training lexicon is called train.lex, and that you set aside some portion for testing purposes as test.lex, which is disjoint from train.lex.

  2. Train a model. To create a first model type:

    g2p.py --train train.lex --devel 5% --write-model model-1

    This first model will be rather poor because it is only a unigram. To create higher order models you need to run g2p.py again:

    g2p.py --model model-1 --ramp-up --train train.lex --devel 5% --write-model model-2

    Repeat this a couple of times

    g2p.py --model model-2 --ramp-up --train train.lex --devel 5% --write-model model-3
    g2p.py --model model-3 --ramp-up --train train.lex --devel 5% --write-model model-4
    ...
    
  3. Evaluate the model. To find out how accurately your model can transcribe unseen words type:

    g2p.py --model model-6 --test test.lex

  4. Transcribe new words. Prepare a list of words you want to transcribe as a simple text file words.txt with one word per line (and no phonemic transcription), then type:

    g2p.py --model model-3 --apply words.txt

Random comments:

  • You cannot open models created in a python3 environment inside a python2 environment. The opposite works.
  • Whenever a file name is required, you can specify "-" to mean standard in, or standard out.
  • If a file name ends in ".gz", it is assumed that the file is (or should be) compressed using gzip.
  • For the time being you need to type g2p.py --help and/or read the source to find out the other things g2p.py can do. Sorry about that.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sequitur-g2p-1.0.1668.7.tar.gz (45.1 kB view details)

Uploaded Source

Built Distributions

sequitur_g2p-1.0.1668.7-cp39-cp39-win_amd64.whl (139.8 kB view details)

Uploaded CPython 3.9 Windows x86-64

sequitur_g2p-1.0.1668.7-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl (794.0 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.5+ x86-64

sequitur_g2p-1.0.1668.7-cp39-cp39-macosx_10_14_x86_64.whl (148.0 kB view details)

Uploaded CPython 3.9 macOS 10.14+ x86-64

sequitur_g2p-1.0.1668.7-cp38-cp38-win_amd64.whl (139.9 kB view details)

Uploaded CPython 3.8 Windows x86-64

sequitur_g2p-1.0.1668.7-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (803.5 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.5+ x86-64

sequitur_g2p-1.0.1668.7-cp38-cp38-macosx_10_14_x86_64.whl (148.6 kB view details)

Uploaded CPython 3.8 macOS 10.14+ x86-64

sequitur_g2p-1.0.1668.7-cp37-cp37m-win_amd64.whl (139.5 kB view details)

Uploaded CPython 3.7m Windows x86-64

sequitur_g2p-1.0.1668.7-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (798.6 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.5+ x86-64

sequitur_g2p-1.0.1668.7-cp37-cp37m-macosx_10_14_x86_64.whl (148.8 kB view details)

Uploaded CPython 3.7m macOS 10.14+ x86-64

sequitur_g2p-1.0.1668.7-cp36-cp36m-win_amd64.whl (139.5 kB view details)

Uploaded CPython 3.6m Windows x86-64

sequitur_g2p-1.0.1668.7-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (798.1 kB view details)

Uploaded CPython 3.6m manylinux: glibc 2.5+ x86-64

sequitur_g2p-1.0.1668.7-cp36-cp36m-macosx_10_14_x86_64.whl (148.8 kB view details)

Uploaded CPython 3.6m macOS 10.14+ x86-64

sequitur_g2p-1.0.1668.7-cp35-cp35m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (797.9 kB view details)

Uploaded CPython 3.5m manylinux: glibc 2.5+ x86-64

File details

Details for the file sequitur-g2p-1.0.1668.7.tar.gz.

File metadata

  • Download URL: sequitur-g2p-1.0.1668.7.tar.gz
  • Upload date:
  • Size: 45.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5

File hashes

Hashes for sequitur-g2p-1.0.1668.7.tar.gz
Algorithm Hash digest
SHA256 3c89a018f6f7433045e7c7b8b38bac86c01c9819b84d9f338d8ca19d5d1504a9
MD5 e724f2d7e8fc3ba4e548f5aed6e21e80
BLAKE2b-256 fe78bc6fe50fce02fe02e869b0bdfbeb87a4813a72e7badc30fdc832df3be485

See more details on using hashes here.

File details

Details for the file sequitur_g2p-1.0.1668.7-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: sequitur_g2p-1.0.1668.7-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 139.8 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5

File hashes

Hashes for sequitur_g2p-1.0.1668.7-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 7e2c83d009a0e1e61809fff468f529d77153e8800472d9a41a9049c3756b0ff5
MD5 37ccbda0b88949adddd5c62494df60a2
BLAKE2b-256 9e16cc82e91d88480ea6d63b191b262cbe7f05ba90a0b119c629f010251c1447

See more details on using hashes here.

File details

Details for the file sequitur_g2p-1.0.1668.7-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for sequitur_g2p-1.0.1668.7-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 ae86de7e0f34725021ef3be2186f9e55c6ca8a49396e8983300c55330cbea24c
MD5 792a0fce7359616c3112da37364489ed
BLAKE2b-256 bdfd9dba3d70f5073a1bbbd42ef81470e2d4d66d8bdfe7240530025b0fa64338

See more details on using hashes here.

File details

Details for the file sequitur_g2p-1.0.1668.7-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: sequitur_g2p-1.0.1668.7-cp39-cp39-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 148.0 kB
  • Tags: CPython 3.9, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5

File hashes

Hashes for sequitur_g2p-1.0.1668.7-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 1132179d90205ba06e4bd615fc106be8f9736f64479a88428683269852243d2c
MD5 ee41c8189a714c21b4423520a8d2b488
BLAKE2b-256 b601b5ff58eb63e67343a59b4af06560ecc5447453f56761dbf63eb3be8f8950

See more details on using hashes here.

File details

Details for the file sequitur_g2p-1.0.1668.7-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: sequitur_g2p-1.0.1668.7-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 139.9 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5

File hashes

Hashes for sequitur_g2p-1.0.1668.7-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 28be2517e9b98b25a4b9fcb6d665fe99ab133a7962ce6d0324360e9f4da918d6
MD5 d9d84d2c68e703bbb765fe6c4593c8a7
BLAKE2b-256 2f5d6180d585aec2094e4a4df057c4d074e4a8badb5dc55dfe794d4ebc482bc3

See more details on using hashes here.

File details

Details for the file sequitur_g2p-1.0.1668.7-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for sequitur_g2p-1.0.1668.7-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 2f53b1092219a44cb56cb0a2528a389cb256a47e5518b8bfc870166ba4ed5450
MD5 67b881d772acda85f89e8796a6f01858
BLAKE2b-256 ea766514ba9c34772d1879b9b31c47db1e741885ba3a119874b92a42b9ffc429

See more details on using hashes here.

File details

Details for the file sequitur_g2p-1.0.1668.7-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: sequitur_g2p-1.0.1668.7-cp38-cp38-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 148.6 kB
  • Tags: CPython 3.8, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5

File hashes

Hashes for sequitur_g2p-1.0.1668.7-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 8269d176fe400a90bdf85818230dffbab482ad70bc5376980f225b3d5c938de1
MD5 11edb37c950cb85846bb37d76d03b94b
BLAKE2b-256 87d3a20698cb6cdaf42c8deaedbf155277274bd8af956b5e72398eeb685955c2

See more details on using hashes here.

File details

Details for the file sequitur_g2p-1.0.1668.7-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: sequitur_g2p-1.0.1668.7-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 139.5 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5

File hashes

Hashes for sequitur_g2p-1.0.1668.7-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 2204aceb50524edf5db9eb6f573c4d687bbb0499da9019d99b95056a59d9e52c
MD5 487ed0c5d72156e4db2094424ad05845
BLAKE2b-256 2c5adde1c891e1f0595a32000f899865013c87ced2edc0fb798651d7ab1021aa

See more details on using hashes here.

File details

Details for the file sequitur_g2p-1.0.1668.7-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for sequitur_g2p-1.0.1668.7-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 abe04e4086a15f975c181d17b7b043d514a6a540b0775e626cb6a5e70fa7342d
MD5 6dddf498d19468924042b02a9d85d941
BLAKE2b-256 3b8df45e2c0b2b8d598a9d7d1c833b9f7c317e4eba4c2ea4437d7a9df6f288db

See more details on using hashes here.

File details

Details for the file sequitur_g2p-1.0.1668.7-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: sequitur_g2p-1.0.1668.7-cp37-cp37m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 148.8 kB
  • Tags: CPython 3.7m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5

File hashes

Hashes for sequitur_g2p-1.0.1668.7-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 6024d8a6da0cbeb5b5eef49bf0cbceda1267d56d7796709f0c65064cf9a9cfb9
MD5 007361fbf68d1bacc6f74d280b815e89
BLAKE2b-256 1cc2044d2ed13a85a5103be4ae533d3372e5ceb330dbfcf7fa98423b27a1b8cb

See more details on using hashes here.

File details

Details for the file sequitur_g2p-1.0.1668.7-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: sequitur_g2p-1.0.1668.7-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 139.5 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5

File hashes

Hashes for sequitur_g2p-1.0.1668.7-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 126ca3ea0a17e85840870ba2cff0bc23eb5387ae353be16f31b2208d10268b2c
MD5 0e8105353ccfbd2b21a8fe89fc612175
BLAKE2b-256 acb93e4e1232c98d021b030235bea297abe523878d2d5e9382e5d4d599ab9a21

See more details on using hashes here.

File details

Details for the file sequitur_g2p-1.0.1668.7-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for sequitur_g2p-1.0.1668.7-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 35078d737e83263c37b3a43f31294817ab445479e8c41c5e88b49811958f422e
MD5 ae8eb942f3352230a4af0ca801d1143d
BLAKE2b-256 9fd2600b6c99e6fa77ff5504f3b7ba6b345e5992b190a7c9549054aee9452918

See more details on using hashes here.

File details

Details for the file sequitur_g2p-1.0.1668.7-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: sequitur_g2p-1.0.1668.7-cp36-cp36m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 148.8 kB
  • Tags: CPython 3.6m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5

File hashes

Hashes for sequitur_g2p-1.0.1668.7-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 cff07e542b429352c53f6d6317a7a089722ec677541ceed54275c79b763940b9
MD5 ada8eb62b89050d7aec0d521c02c1278
BLAKE2b-256 fd71ce5656e18b96b13241c1de418fbaec8e7ceba631403cab83a51e7dd9c2e6

See more details on using hashes here.

File details

Details for the file sequitur_g2p-1.0.1668.7-cp35-cp35m-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for sequitur_g2p-1.0.1668.7-cp35-cp35m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 0c08ac24909d12eecc2cce6ac0f8ecc010109cf01ae8bc85fba622fa408e2a84
MD5 3e2833ef00733082e9a52ac307a34f3d
BLAKE2b-256 7c33432e44deb08867a074e8c4ffdc6fd3eaf16947c0078ec294e00f11cdfb2b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page