Skip to main content

Fast multilingual text-to-phoneme converter for South East Asian languages.

Project description

🦭 SEA-G2P

image

Fast multilingual text-to-phoneme converter for South East Asian languages.

Author: Pham Nguyen Ngoc Bao

🚀 Used By

SEA-G2P is the core phonemization engine powering:

  • VieNeu-TTS: An advanced on-device Vietnamese Text-to-Speech model with instant voice cloning.

By using SEA-G2P, VieNeu-TTS achieves high-fidelity pronunciation and seamless Vietnamese-English code-switching.

Installation

pip install sea-g2p

Usage

Simple Pipeline

from sea_g2p import SEAPipeline

pipeline = SEAPipeline(lang="vi")
result = pipeline.run("Giá SP500 hôm nay là 4.200,5 điểm.")
print(result)
#zˈaːɜ ˈɛɜt̪ pˈe nˈam tʃˈam hˈom nˈaj lˌaː2 bˈoɜn ŋˈi2n hˈaːj tʃˈam fˈəɪ4 nˈam ɗˈiɛ4m.

Individual Modules

from sea_g2p import Normalizer, G2P

normalizer = Normalizer(lang="vi")
g2p = G2P(lang="vi")

text = "Giá SP500 hôm nay là 4.200,5 điểm"
normalized = normalizer.normalize(text)
print(normalized)
phonemes = g2p.convert(normalized)
print(phonemes)
#giá ét pê năm trăm hôm nay là bốn nghìn hai trăm phẩy năm điểm.
#zˈaːɜ ˈɛɜt̪ pˈe nˈam tʃˈam hˈom nˈaj lˌaː2 bˈoɜn ŋˈi2n hˈaːj tʃˈam fˈəɪ4 nˈam ɗˈiɛ4m.

Features

  • Blazing Fast: Core engine rewritten in Rust with binary mmap lookup.
  • Zero Dependency: Pre-compiled wheels for Windows, Linux, and macOS.
  • Smart Normalization: Specialized for Vietnamese (numbers, dates, technical terms).
  • Bilingual Support: Handles mixed Vietnamese/English text seamlessly.

📊 Performance

The following benchmarks were conducted on a dataset of 100,000 words (26,000+ lines):

Module Language Implementation Throughput Avg Time/Line
Normalizer Vietnamese Python ~39,000 words/s 0.09 ms
G2P Multilingual Rust Core ~480,000 words/s 0.007 ms

Total Pipeline Throughput: ~36,000 words/s (Tested on CPython 3.12, Windows 11)

Technical Architecture

SEA-G2P is designed for maximum performance in production environments:

  • Memory Mapping (mmap): Instead of loading a huge JSON/SQLite into RAM, we use a custom binary format (.bin) mapped directly into memory. This allows near-instant startup and extremely low memory overhead.
  • String Pooling: To minimize file size, all unique strings (words and phonemes) are stored once in a global string pool and referenced by 4-byte IDs.
  • Binary Search: Words are pre-sorted during the build process, allowing O(log n) lookup speeds directly on the memory-mapped data.

For full details on the specification, see src/g2p/mod.rs.

Development

To install for development purposes:

  1. Clone the repository:

    git clone https://github.com/pnnbao97/sea-g2p
    cd sea-g2p
    
  2. Install in editable mode:

    pip install -e .
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sea_g2p-0.6.3.tar.gz (20.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sea_g2p-0.6.3-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (21.0 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

sea_g2p-0.6.3-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (21.0 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

sea_g2p-0.6.3-cp310-abi3-win_amd64.whl (20.8 MB view details)

Uploaded CPython 3.10+Windows x86-64

sea_g2p-0.6.3-cp310-abi3-win32.whl (20.7 MB view details)

Uploaded CPython 3.10+Windows x86

sea_g2p-0.6.3-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (21.0 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

sea_g2p-0.6.3-cp310-abi3-macosx_11_0_arm64.whl (20.9 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

sea_g2p-0.6.3-cp310-abi3-macosx_10_12_x86_64.whl (20.9 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

sea_g2p-0.6.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (21.0 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file sea_g2p-0.6.3.tar.gz.

File metadata

  • Download URL: sea_g2p-0.6.3.tar.gz
  • Upload date:
  • Size: 20.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for sea_g2p-0.6.3.tar.gz
Algorithm Hash digest
SHA256 4199109b1fba6fe60538973a8d00e75d4de0f5f47f5b6bedf4dbf7ed901dccb9
MD5 9c769fd1aa6ed8a3f23141623df955a2
BLAKE2b-256 279208de2b0b5646131ca0289b5e335347521499d7bb81aa2a4a5ebbd600dcb3

See more details on using hashes here.

File details

Details for the file sea_g2p-0.6.3-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for sea_g2p-0.6.3-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 a544460f2326a84f9625ad6cac5640bbd1a86eb99080e9a1fd717ea23d790f4a
MD5 daeba2950517e75b36825dabdb3de83e
BLAKE2b-256 09222d4db34733dc07d9e6467a70d7d92ea287309e6abc00466c3d5aa8ce23fb

See more details on using hashes here.

File details

Details for the file sea_g2p-0.6.3-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for sea_g2p-0.6.3-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 e48e5829ffd315cc084a927adc23aa1d92b38f831bbbf46f37e957c84296474a
MD5 06bf77e7535fa54996c5d5b225c59f3a
BLAKE2b-256 090ba88361711efa1df9e84f89a74f959f2fffb5ad9e83756f7076e5d4505b35

See more details on using hashes here.

File details

Details for the file sea_g2p-0.6.3-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: sea_g2p-0.6.3-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 20.8 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for sea_g2p-0.6.3-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 4b8c2f2ad1c464dd957b7588e6a6b1fc51c64d9bf57bd45bc974ac7c3bfd3363
MD5 af32c86f1864611fccb60037ec421108
BLAKE2b-256 b8cc46115293d41a074242a59c4fd79f0460acd9084321a195e8344116d66d7c

See more details on using hashes here.

File details

Details for the file sea_g2p-0.6.3-cp310-abi3-win32.whl.

File metadata

  • Download URL: sea_g2p-0.6.3-cp310-abi3-win32.whl
  • Upload date:
  • Size: 20.7 MB
  • Tags: CPython 3.10+, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for sea_g2p-0.6.3-cp310-abi3-win32.whl
Algorithm Hash digest
SHA256 b47b7456df0fd7dc15941168dee3b51cfb904ca8619d9610377f8ef69bccacea
MD5 0bfd5ec7bdcd8c2ee056ffc2f5d9e650
BLAKE2b-256 357d82dd3636ba020b2af4dc32dbb980c760b41458285b3cbfcc2063ae4bcadb

See more details on using hashes here.

File details

Details for the file sea_g2p-0.6.3-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for sea_g2p-0.6.3-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 175bac84a25c4352b2fd05fa242fab8161d0fc0c04393cf2f27484ba34d6c17e
MD5 96e90f46aecfbc254c35edd46637e5a1
BLAKE2b-256 b042bc16fbcd249741440ee5a750f5a36a3a9b002d30da8e7d931f72bff136dd

See more details on using hashes here.

File details

Details for the file sea_g2p-0.6.3-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for sea_g2p-0.6.3-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9e4343785bd311db3417c8f3508433318387127c553950974d4cef680086604c
MD5 751a57a81013225bd4a205acf3ddbfb4
BLAKE2b-256 3ef9e6f28a5dd9ab1063306554a1d34429fcf74be42923576ec24d87c9b418f7

See more details on using hashes here.

File details

Details for the file sea_g2p-0.6.3-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for sea_g2p-0.6.3-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 d107c01a7d2eef5624e1c79a24ed7052dc44d27d4cbcb5e340479c014926476c
MD5 7585aea0dbb66a20046e0ec4ce11997f
BLAKE2b-256 f862977cc90a1b6e29654f41570eb5e96a834d1eb4893d5464114e7e42847819

See more details on using hashes here.

File details

Details for the file sea_g2p-0.6.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sea_g2p-0.6.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fc3e28b821dea4e4d2db3695c5ed4774508822904ac7af205180a5747ed0eb57
MD5 973d390936b1755df8bd494e0430da7b
BLAKE2b-256 60d443cb83fb28a97e464bd025960a054306ed5af5a6941c254875982ee386c6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page