Skip to main content

Fast multilingual text-to-phoneme converter for South East Asian languages.

Project description

🦭 SEA-G2P

image

Fast multilingual text-to-phoneme converter for South East Asian languages.

Author: Pham Nguyen Ngoc Bao

🚀 Used By

SEA-G2P is the core phonemization engine powering:

  • VieNeu-TTS: An advanced on-device Vietnamese Text-to-Speech model with instant voice cloning.

By using SEA-G2P, VieNeu-TTS achieves high-fidelity pronunciation and seamless Vietnamese-English code-switching.

Installation

pip install sea-g2p

Usage

Simple Pipeline

from sea_g2p import SEAPipeline

pipeline = SEAPipeline(lang="vi")
result = pipeline.run("Giá SP500 hôm nay là 4.200,5 điểm.")
print(result)
#zˈaːɜ ˈɛɜt̪ pˈe nˈam tʃˈam hˈom nˈaj lˌaː2 bˈoɜn ŋˈi2n hˈaːj tʃˈam fˈəɪ4 nˈam ɗˈiɛ4m.

Individual Modules

from sea_g2p import Normalizer, G2P

normalizer = Normalizer(lang="vi")
g2p = G2P(lang="vi")

text = "Giá SP500 hôm nay là 4.200,5 điểm"
normalized = normalizer.normalize(text)
print(normalized)
phonemes = g2p.convert(normalized)
print(phonemes)
#giá ét pê năm trăm hôm nay là bốn nghìn hai trăm phẩy năm điểm.
#zˈaːɜ ˈɛɜt̪ pˈe nˈam tʃˈam hˈom nˈaj lˌaː2 bˈoɜn ŋˈi2n hˈaːj tʃˈam fˈəɪ4 nˈam ɗˈiɛ4m.

Features

  • Blazing Fast: Core engine rewritten in Rust with binary mmap lookup.
  • Zero Dependency: Pre-compiled wheels for Windows, Linux, and macOS.
  • Smart Normalization: Specialized for Vietnamese (numbers, dates, technical terms).
  • Bilingual Support: Handles mixed Vietnamese/English text seamlessly.

📊 Performance

The following benchmarks were conducted on a dataset of 100,000 words (26,000+ lines):

Module Language Implementation Throughput Avg Time/Line
Normalizer Vietnamese Python ~39,000 words/s 0.09 ms
G2P Multilingual Rust Core ~480,000 words/s 0.007 ms

Total Pipeline Throughput: ~36,000 words/s (Tested on CPython 3.12, Windows 11)

Technical Architecture

SEA-G2P is designed for maximum performance in production environments:

  • Memory Mapping (mmap): Instead of loading a huge JSON/SQLite into RAM, we use a custom binary format (.bin) mapped directly into memory. This allows near-instant startup and extremely low memory overhead.
  • String Pooling: To minimize file size, all unique strings (words and phonemes) are stored once in a global string pool and referenced by 4-byte IDs.
  • Binary Search: Words are pre-sorted during the build process, allowing O(log n) lookup speeds directly on the memory-mapped data.

For full details on the specification, see src/g2p/mod.rs.

Development

To install for development purposes:

  1. Clone the repository:

    git clone https://github.com/pnnbao97/sea-g2p
    cd sea-g2p
    
  2. Install in editable mode:

    pip install -e .
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sea_g2p-0.6.1.tar.gz (20.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sea_g2p-0.6.1-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (21.0 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

sea_g2p-0.6.1-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (21.0 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

sea_g2p-0.6.1-cp310-abi3-win_amd64.whl (20.8 MB view details)

Uploaded CPython 3.10+Windows x86-64

sea_g2p-0.6.1-cp310-abi3-win32.whl (20.7 MB view details)

Uploaded CPython 3.10+Windows x86

sea_g2p-0.6.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (21.0 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

sea_g2p-0.6.1-cp310-abi3-macosx_11_0_arm64.whl (20.9 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

sea_g2p-0.6.1-cp310-abi3-macosx_10_12_x86_64.whl (20.9 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

sea_g2p-0.6.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (21.0 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file sea_g2p-0.6.1.tar.gz.

File metadata

  • Download URL: sea_g2p-0.6.1.tar.gz
  • Upload date:
  • Size: 20.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for sea_g2p-0.6.1.tar.gz
Algorithm Hash digest
SHA256 119a5bf191ab9f6d1b22a1889c542e60b3c5634727faacf108d927c103650e49
MD5 467877222465ead09572ca1848426101
BLAKE2b-256 bcb69af637f8628049519a94c8a6d1d5a1c66e95b5cfa91907d3800ca287298d

See more details on using hashes here.

File details

Details for the file sea_g2p-0.6.1-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for sea_g2p-0.6.1-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 766a2ebf1f1610ee9caf9dda19b87d5eacffe4e983f4b34557edf49f41cca2e1
MD5 5633d08f41743adea29fae6b10044e04
BLAKE2b-256 3f64d2165f9f45394feb8cb1d4a917190c62370e53006b88c0c2ef373cc5201c

See more details on using hashes here.

File details

Details for the file sea_g2p-0.6.1-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for sea_g2p-0.6.1-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 89f9c45aed5b8b47fc2928cbee3fde296a47b9feb3fd2d670e428560aae09bcb
MD5 ce2d6d5bd9680774338e1d2d4e6d74c1
BLAKE2b-256 d7b0939b4ff6e6f5dbe43ece0062ccc86e77e01f417f8812245294fb9ba3759e

See more details on using hashes here.

File details

Details for the file sea_g2p-0.6.1-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: sea_g2p-0.6.1-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 20.8 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for sea_g2p-0.6.1-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 7675d95c3ba66846afebe1b1c94bed56fbdd90b68d164ecac56469a56a7be868
MD5 e7ca521ef9bcdaaca3f84cec25820fe3
BLAKE2b-256 c4f2cde20ce4ea29572685cdfcd31a27f4b819b71ded3f74d2990beea80c2404

See more details on using hashes here.

File details

Details for the file sea_g2p-0.6.1-cp310-abi3-win32.whl.

File metadata

  • Download URL: sea_g2p-0.6.1-cp310-abi3-win32.whl
  • Upload date:
  • Size: 20.7 MB
  • Tags: CPython 3.10+, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for sea_g2p-0.6.1-cp310-abi3-win32.whl
Algorithm Hash digest
SHA256 a6f3fa0a6b53847e04835bf0b878b54b419f6fa19f433bf5fd355d858871487c
MD5 88107b9c6f86bae991e05ca394b10fc0
BLAKE2b-256 6e0200c275ec02b37567c7325e4c7e6b6eccd3bb7ff410613dd8387321d0e6c5

See more details on using hashes here.

File details

Details for the file sea_g2p-0.6.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for sea_g2p-0.6.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 66e12c7eba10229e3e543ee085a4f750978ea047114b026128c9291cf9826e62
MD5 a16011604b00296f406cd9e57dd26ca7
BLAKE2b-256 08138246964d1411b6c6833949cc5eccde00c98756d1d95452bf7fa23f74ac37

See more details on using hashes here.

File details

Details for the file sea_g2p-0.6.1-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for sea_g2p-0.6.1-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2b4f20c48743beed9ab3cf3109a0b94aa804d0ba4792f370d79772fa905a1025
MD5 01452cdcdc5777e4c9f1f06d0b37ee24
BLAKE2b-256 67ccf4b080d209e8b9e0e67ffec5d94ffac0a6a6906517f16d92891f254f9718

See more details on using hashes here.

File details

Details for the file sea_g2p-0.6.1-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for sea_g2p-0.6.1-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 3b7f89ec8335c8aa1b14784c272efa3df06038290993d21b6e0e4dd126f0c7ec
MD5 8c53d2146297fd89cba77a9310492264
BLAKE2b-256 277e0b9b25850edefff0707b111661557840cbfe4ee39cbb300f2b676853ab39

See more details on using hashes here.

File details

Details for the file sea_g2p-0.6.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sea_g2p-0.6.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 69f51d77f7bcef04d2eccdc2f00ab4dbaa7421fdd2da97cab8ad2435a6f1c607
MD5 741ff2cb13b6b50616c4a664033f6b4f
BLAKE2b-256 8e635b79ebeb77578ec70e0ef488f76d6e30a2e75b0ddc177cb64d1e0ef3a820

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page