Pure-Python Cotovia G2P phonemizer for Galician and Spanish
Project description
pycotovia
Pure-Python G2P (grapheme-to-phoneme) phonemizer for Galician and Spanish, based on the Cotovia TTS system.
Features
- Two languages — Galician (
gl) and Spanish (es) with language-specific exception lists and rewrite rules. - Zero dependencies — pure Python, no C extensions, no heavy ML models.
- Fast enough — single-word latency is well under 1 ms on modern hardware.
- Parity-tested — verified against the original Cotovia C binary for Galician (see docs/parity.md).
- IPA output — optional mapping from Cotovia phoneme symbols to IPA.
Installation
pip install pycotovia
Requires Python >= 3.11.
Quick start
import pycotovia
# Galician (default)
print(pycotovia.phonemize("Ola, como estás?")) # → "ola komo estajs"
print(pycotovia.phonemize("guerra", lang="gl")) # → "gerra"
# Spanish
print(pycotovia.phonemize("México", lang="es")) # → "meksiko"
print(pycotovia.phonemize("México", lang="gl")) # → "meSiko"
# IPA mapping
print(pycotovia.cotovia_to_ipa("gerra")) # → "ɣɛra"
CLI
# Galician
echo "Ola mundo" | pycotovia
# Spanish
echo "Hola mundo" | pycotovia -l es
# From file
cat words.txt | pycotovia -l gl > phonemes.txt
Differences from the Cotovia binary
| Aspect | pycotovia | Cotovia C binary |
|---|---|---|
| Timbre (open/closed e/o) | Not applied in transcription mode | Same — only used for voice-building |
Stress in bui, fui, cuido |
Correctly shifts to u (buj, fuj, kujDo) |
Bug: keeps stress on i (bwi, fwi, kwiDo) due to a precedence error in aguda() / grave() |
See docs/parity.md for the full parity test results and the deliberate divergences.
Documentation
- docs/architecture.md — pipeline overview and module map
- docs/parity.md — verification against the Cotovia binary
- docs/api.md — public API reference
Examples
See the examples/ directory for:
basic_usage.py— single words, phrases, and IPAspanish_usage.py— Spanish-specific examplesphrase_processing.py— batch processing from a file
License
Apache-2.0 — this is a clean-room reimplementation in Python, not a derivative of the C++ source.
Acknowledgements
This is a clean-room port of the Cotovia G2P subsystem (transcription rules, syllabification, stress assignment, and exception lists) from C++ to Python. Cotovia was developed by the Multimedia Technologies Group at the University of Vigo and the Centro Ramón Piñeiro for Research in Humanities.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pycotovia-0.1.1a1.tar.gz.
File metadata
- Download URL: pycotovia-0.1.1a1.tar.gz
- Upload date:
- Size: 26.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d5904e00d6f44629985c0101b2313838ce602b6d5343274e40cc3a2857422b2
|
|
| MD5 |
90d8d8bacbb1b8fb998b4388b7709eee
|
|
| BLAKE2b-256 |
48a21b6aebe2fe4f4908648f745ab3fae52b49f23bf8c8e91f59266937650c76
|
File details
Details for the file pycotovia-0.1.1a1-py3-none-any.whl.
File metadata
- Download URL: pycotovia-0.1.1a1-py3-none-any.whl
- Upload date:
- Size: 25.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a7b2381f4c2fcd350ff490ddc645e5680fbb89ca098082e396a922a2f04cfb74
|
|
| MD5 |
33e03cf2e593b963029fd60e712c5854
|
|
| BLAKE2b-256 |
bc3386fa43e541680ee6ec2202e60fa6066c1c05007d40a35b2000c7da931c2a
|