Dialect-aware Portuguese (Lusophone) text-to-IPA phonemizer

These details have not been verified by PyPI

Project links

Project description

tugaphone — dialect-aware Portuguese phonemizer

tugaphone converts Portuguese text to IPA across all five Lusophone dialect groups. It combines a curated phonetic lexicon, part-of-speech tagging for homograph disambiguation, meaning-based heterophone resolution via bifonia, and a scientifically-grounded regional-accent layer.

O gato dorme.
pt-PT → ˈu gˈa·tu ˈdoɾ·mɨ ˈ···
pt-BR → ˈu gˈa·tʊ ˈdoɾ·mɪ ˈ···
pt-AO → ˈu gˈa·tʊ ˈdoɾ·me ˈ···
pt-MZ → ˈu gˈa·tu ˈdoɾ·me ˈ···
pt-TL → ˈu gˈa·tʊ ˈdoɾ·me ˈ···

Install

pip install tugaphone

30-second quick start

from tugaphone import TugaPhonemizer

ph = TugaPhonemizer()
print(ph.phonemize_sentence("O gato dorme.", "pt-PT"))
# ˈu gˈa·tu ˈdoɾ·mɨ ˈ···

TugaPhonemizer() loads the lexicon and POS tagger once; then call phonemize_sentence(text, lang) as many times as you like. Output is a space-separated phoneme string — one token per word — with ˈ marking primary stress and · marking syllable boundaries.

Features

Five dialect inventories

Code	Region
`pt-PT`	European Portuguese — heavy vowel reduction, post-alveolar fricatives, uvular /ʁ/
`pt-BR`	Brazilian Portuguese — fuller vowels, /t d/ palatalisation, l-vocalisation
`pt-AO`	Angolan Portuguese — moderate reduction, alveolar trill, Bantu substrate
`pt-MZ`	Mozambican Portuguese — similar to European with regional variation
`pt-TL`	Timorese Portuguese — conservative pronunciation, Tetum substrate

for code in ["pt-PT", "pt-BR", "pt-AO", "pt-MZ", "pt-TL"]:
    print(code, "→", ph.phonemize_sentence("Choveu muito ontem.", code))
# pt-PT → ʃu·ˈvew mˈũj·tu ˈõ·tẽ ˈ···
# pt-BR → ʃo·ˈvew mwˈĩ·tʊ ˈõ·tẽ ˈ···
# pt-AO → ʃo·ˈvew mˈũjn·tʊ ˈõ·tẽ ˈ···
# pt-MZ → ʃu·ˈvew mˈũj·tu ˈõ·tẽ ˈ···
# pt-TL → ʃo·ˈvew mˈuj·tʊ ˈõ·tẽ ˈ···

Homograph disambiguation

Heterophonic homographs are resolved at two levels:

Meaning-based (via bifonia): sede thirst vs HQ, forma mould vs shape.
POS-based: gosto noun /ˈgoʃtu/ vs verb /ˈgɔʃtu/, para preposition vs verb.

print(ph.phonemize_sentence("Eu gosto de música."))   # verb → ˈgɔʃ·tu
print(ph.phonemize_sentence("Tenho bom gosto."))      # noun → ˈgoʃ·tu

Sub-regional accents

RegionalTransforms presets layer phonological rules on top of any dialect. Rules are grounded in published phonology (Cintra 1971; ALEPG):

from tugaphone.regional import PortoDialect, AzoresDialect

# Porto: stressed /o/ → [uo] (rising diphthong)
print(ph.phonemize_sentence("O vinho é muito bom.", "pt-PT", regional_dialect=PortoDialect))
# ˈu bˈi·ɲu ˈɛ mˈũj·tu bˈuõ ˈ···

# Açores: stressed /u/ → [y], l-palatalisation
print(ph.phonemize_sentence("O vinho é muito bom.", "pt-PT", regional_dialect=AzoresDialect))
# ˈy vˈi·ɲu ˈɛ mˈỹj·tu bˈõ ˈ···

Available presets: NorthernDialect, PortoDialect, MinhoDialect, BragaDialect, FamalicaoDialect, FafeDialect, TrasMontanoDialect, CoimbraDialect, AlentejoDialect, AlgarveDialect, MadeiraDialect, AzoresDialect.

Number normalization

Digits are spelled out with gender agreement and long/short scale per dialect:

from tugaphone.number_utils import normalize_numbers

normalize_numbers("vou comprar 1 casa")   # 'vou comprar uma casa'
normalize_numbers("vou adotar 1 cão")    # 'vou adotar um cão'
normalize_numbers("comprei 2 casas")     # 'comprei duas casas'

Syllabification and stress

Syllabification is handled by silabificador, registered as an orthography2ipa syllabifier plugin. Stress detection delegates to orthography2ipa's declarative StressRules.

Rules-only mode

Pass an IRREGULAR_WORDS-emptied dialect inventory to bypass the lexicon and use only grapheme rules — useful for testing rule coverage or synthesising unknown words.

orthography2ipa plugin interface

TugaphoneG2PPlugin implements orthography2ipa's G2PPlugin interface; SilabificadorSyllabifier implements its SyllabifierPlugin interface and is registered at the orthography2ipa.syllabify entry point.

from tugaphone.plugin import TugaphoneG2PPlugin

p = TugaphoneG2PPlugin(lang="pt-BR")
print(p.transcribe("o gato dorme"))   # ˈu gˈa·tʊ ˈdoɾ·mɪ

Sibling libraries

tugaphone is part of the TigreGotico Portuguese NLP stack:

Library	Role
tugalex	Phonetic lexicon
tugatagger	POS tagger
silabificador	Syllabifier
bifonia	Heterophone sense disambiguation
orthography2ipa	G2P plugin base + stress rules

Documentation

docs/quickstart.md — install, first call, dialect overview
docs/dialects.md — five inventories and sub-regional accent presets
docs/homographs.md — meaning-based and POS-based disambiguation
docs/numbers.md — number normalization and gender agreement
docs/api.md — full class and function reference
docs/tokenizer.md — the Sentence → Word → Grapheme → Character model
docs/advanced.md — accents, serialization, integration

License

Apache License 2.0. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.1a1 pre-release

Jun 20, 2026

0.6.0a1 pre-release

Jun 13, 2026

0.5.1a3 pre-release

Jun 12, 2026

0.5.1a2 pre-release

Jun 12, 2026

This version

0.5.1a1 pre-release

Jun 12, 2026

0.5.0a2 pre-release

Jun 12, 2026

0.5.0a1 pre-release

Jun 12, 2026

0.4.0a2 pre-release

Jun 12, 2026

0.4.0a1 pre-release

Jun 12, 2026

0.3.1a1 pre-release

Jun 12, 2026

0.3.0a1 pre-release

Jun 11, 2026

0.2.2a2 pre-release

May 29, 2026

0.2.2a1 pre-release

Feb 25, 2026

0.2.1

Feb 6, 2026

0.2.0

Feb 6, 2026

0.2.0a2 pre-release

Feb 6, 2026

0.2.0a1 pre-release

Feb 6, 2026

0.1.0a1 pre-release

Feb 6, 2026

0.0.2

Oct 12, 2025

0.0.2a1 pre-release

Oct 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tugaphone-0.5.1a1.tar.gz (80.2 kB view details)

Uploaded Jun 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tugaphone-0.5.1a1-py3-none-any.whl (69.3 kB view details)

Uploaded Jun 12, 2026 Python 3

File details

Details for the file tugaphone-0.5.1a1.tar.gz.

File metadata

Download URL: tugaphone-0.5.1a1.tar.gz
Upload date: Jun 12, 2026
Size: 80.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tugaphone-0.5.1a1.tar.gz
Algorithm	Hash digest
SHA256	`5502de4b365db285c49545b9792a0a542ad1fa0121c38ba891b0707e0ba9d88d`
MD5	`44279a8e73a01855dfe875077d7a5b59`
BLAKE2b-256	`42aa60573e0d1c77b2df6ff7ef75083c87e50675f49232baccb5d474dddaca74`

See more details on using hashes here.

File details

Details for the file tugaphone-0.5.1a1-py3-none-any.whl.

File metadata

Download URL: tugaphone-0.5.1a1-py3-none-any.whl
Upload date: Jun 12, 2026
Size: 69.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tugaphone-0.5.1a1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1533b5f2aa21e27ccd4301dfe201190747437685c6ef1fc3d137357eb18ba53e`
MD5	`8747d867acd39c32f08ce64a6aeef059`
BLAKE2b-256	`bcfad3b0537a23b38ee4e86ab3cc3da99a3a4e995fd31038fff0aac8986f46d6`

See more details on using hashes here.

tugaphone 0.5.1a1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

tugaphone — dialect-aware Portuguese phonemizer

Install

30-second quick start

Features

Five dialect inventories

Homograph disambiguation

Sub-regional accents

Number normalization

Syllabification and stress

Rules-only mode

orthography2ipa plugin interface

Sibling libraries

Documentation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes