Skip to main content

Fast multilingual text-to-phoneme converter for South East Asian languages.

Project description

🦭 SEA-G2P

image

Fast multilingual text-to-phoneme converter for South East Asian languages.

Author: Pham Nguyen Ngoc Bao

🚀 Used By

SEA-G2P is the core phonemization engine powering:

  • VieNeu-TTS: An advanced on-device Vietnamese Text-to-Speech model with instant voice cloning.

By using SEA-G2P, VieNeu-TTS achieves high-fidelity pronunciation and seamless Vietnamese-English code-switching.

Installation

pip install sea-g2p

Usage

Simple Pipeline

from sea_g2p import SEAPipeline

pipeline = SEAPipeline(lang="vi")
result = pipeline.run("Giá SP500 hôm nay là 4.200,5 điểm.")
print(result)
#zˈaːɜ ˈɛɜt̪ pˈe nˈam tʃˈam hˈom nˈaj lˌaː2 bˈoɜn ŋˈi2n hˈaːj tʃˈam fˈəɪ4 nˈam ɗˈiɛ4m.

Individual Modules

from sea_g2p import Normalizer, G2P

normalizer = Normalizer(lang="vi")
g2p = G2P(lang="vi")

text = "Giá SP500 hôm nay là 4.200,5 điểm"
normalized = normalizer.normalize(text)
print(normalized)
phonemes = g2p.convert(normalized)
print(phonemes)
#giá ét pê năm trăm hôm nay là bốn nghìn hai trăm phẩy năm điểm.
#zˈaːɜ ˈɛɜt̪ pˈe nˈam tʃˈam hˈom nˈaj lˌaː2 bˈoɜn ŋˈi2n hˈaːj tʃˈam fˈəɪ4 nˈam ɗˈiɛ4m.

Features

  • Blazing Fast: Core engine rewritten in Rust with binary mmap lookup.
  • Zero Dependency: Pre-compiled wheels for Windows, Linux, and macOS.
  • Smart Normalization: Specialized for Vietnamese (numbers, dates, technical terms).
  • Bilingual Support: Handles mixed Vietnamese/English text seamlessly.

📊 Performance

The following benchmarks were conducted on a dataset of 100,000 words (26,000+ lines):

Module Language Implementation Throughput Avg Time/Line
Normalizer Vietnamese Python ~39,000 words/s 0.09 ms
G2P Multilingual Rust Core ~480,000 words/s 0.007 ms

Total Pipeline Throughput: ~36,000 words/s (Tested on CPython 3.12, Windows 11)

Technical Architecture

SEA-G2P is designed for maximum performance in production environments:

  • Memory Mapping (mmap): Instead of loading a huge JSON/SQLite into RAM, we use a custom binary format (.bin) mapped directly into memory. This allows near-instant startup and extremely low memory overhead.
  • String Pooling: To minimize file size, all unique strings (words and phonemes) are stored once in a global string pool and referenced by 4-byte IDs.
  • Binary Search: Words are pre-sorted during the build process, allowing O(log n) lookup speeds directly on the memory-mapped data.

For full details on the specification, see src/g2p/mod.rs.

Development

To install for development purposes:

  1. Clone the repository:

    git clone https://github.com/pnnbao97/sea-g2p
    cd sea-g2p
    
  2. Install in editable mode:

    pip install -e .
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sea_g2p-0.6.2.tar.gz (20.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sea_g2p-0.6.2-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (21.0 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

sea_g2p-0.6.2-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (21.0 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

sea_g2p-0.6.2-cp310-abi3-win_amd64.whl (20.8 MB view details)

Uploaded CPython 3.10+Windows x86-64

sea_g2p-0.6.2-cp310-abi3-win32.whl (20.7 MB view details)

Uploaded CPython 3.10+Windows x86

sea_g2p-0.6.2-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (21.0 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

sea_g2p-0.6.2-cp310-abi3-macosx_11_0_arm64.whl (20.9 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

sea_g2p-0.6.2-cp310-abi3-macosx_10_12_x86_64.whl (20.9 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

sea_g2p-0.6.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (21.0 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file sea_g2p-0.6.2.tar.gz.

File metadata

  • Download URL: sea_g2p-0.6.2.tar.gz
  • Upload date:
  • Size: 20.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for sea_g2p-0.6.2.tar.gz
Algorithm Hash digest
SHA256 fba430495646e329dda64e23518ef7ce4595523c7a400073c9178b742c18e912
MD5 205e82f109d47931cc4facaf2dabfc3a
BLAKE2b-256 1090463e40c17c0861b4e29978213d82d7abaa803eb2a6d7aa725a9656a4684c

See more details on using hashes here.

File details

Details for the file sea_g2p-0.6.2-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for sea_g2p-0.6.2-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 28ccbafa945dfb5b122911dad6b729f963d954cae1f7af650ef1a912193774ff
MD5 152f81f19822002c3dc5adf959d6c683
BLAKE2b-256 5a8d0dd9c4eb79d42d30b5f8fe97e7b7800b173ee102cb8cd75dec9a46e17b47

See more details on using hashes here.

File details

Details for the file sea_g2p-0.6.2-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for sea_g2p-0.6.2-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 48b90e951cae799e8dd17239781db74ba157bbf065bab4103e36847cee8096f1
MD5 349ca1795305fd914906556028169469
BLAKE2b-256 4785327ddc718ee5e0e67a76b60109fef3c006387f545de0236e453cce4faea7

See more details on using hashes here.

File details

Details for the file sea_g2p-0.6.2-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: sea_g2p-0.6.2-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 20.8 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for sea_g2p-0.6.2-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 4e0502ae16c6dacee90469f7134e38009f2a2ca63d69ad4524ccfa946c2f23e5
MD5 f5b9f5360b485a71ce2df849b22d4003
BLAKE2b-256 96a61bf6cdbd69f1411a124cb15712ee2c29b35d5a11cd328295c6c20a7774a0

See more details on using hashes here.

File details

Details for the file sea_g2p-0.6.2-cp310-abi3-win32.whl.

File metadata

  • Download URL: sea_g2p-0.6.2-cp310-abi3-win32.whl
  • Upload date:
  • Size: 20.7 MB
  • Tags: CPython 3.10+, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for sea_g2p-0.6.2-cp310-abi3-win32.whl
Algorithm Hash digest
SHA256 07ee0c7a98e457d3e3bfd8858cebaf7c8ddcc5044ae39b28b5e1279df29fe5e4
MD5 540b39c05254eb68f37c35e447243b1e
BLAKE2b-256 54453245278e164ef29fd29ff2aa9fd9675d461d47054400f67f14b22c8a2458

See more details on using hashes here.

File details

Details for the file sea_g2p-0.6.2-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for sea_g2p-0.6.2-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 40f813d2f85c9e4b6eff05c3eda9f443359e12565d7af843baa243fe3f3283b0
MD5 2cc38d8d432f711ba370e347af5048ff
BLAKE2b-256 804e614df756626399ed785ae399e4935fa2563a377afaf68a36ef61e9c84d6e

See more details on using hashes here.

File details

Details for the file sea_g2p-0.6.2-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for sea_g2p-0.6.2-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ab3d91946473d4761552becb1c0353243b23282559ee69f6fe63ab5fe630fcfd
MD5 05ec703382083b5930ca93e7acc61bc3
BLAKE2b-256 495bff15131c1c6ff32a533522d1d7351924073e07da970abab412532591a1bd

See more details on using hashes here.

File details

Details for the file sea_g2p-0.6.2-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for sea_g2p-0.6.2-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 c84f5683817689d438c61eb9609ab89f93d47eea10c211f66bd9a4f2aa0e6e36
MD5 d336828d260d13098cc64b8224e79ab8
BLAKE2b-256 e51244140f6defc230aec6d093b386f7fa7a1007f4b3c66fdde4b16a9eae9c20

See more details on using hashes here.

File details

Details for the file sea_g2p-0.6.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sea_g2p-0.6.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 06d0c8526f23ceee4642ad231b82f92f3c5c07df99c76229a0430bc5728b7e85
MD5 1c10a59c59d235ee7ad17ff0045ba9e4
BLAKE2b-256 308457d2a75c7264d78af61799b3704c29f101f32cd260b169faf46a44baca30

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page