Skip to main content

Pure-Python Cotovia G2P phonemizer for Galician and Spanish

Project description

pycotovia

Pure-Python G2P (grapheme-to-phoneme) phonemizer for Galician and Spanish, based on the Cotovia TTS system.

Features

  • Two languages — Galician (gl) and Spanish (es) with language-specific exception lists and rewrite rules.
  • Zero dependencies — pure Python, no C extensions, no heavy ML models.
  • Fast enough — single-word latency is well under 1 ms on modern hardware.
  • Parity-tested — verified against the original Cotovia C binary for Galician (see docs/parity.md).
  • IPA output — optional mapping from Cotovia phoneme symbols to IPA.

Installation

pip install pycotovia

Requires Python >= 3.11.

Quick start

import pycotovia

# Galician (default)
print(pycotovia.phonemize("Ola, como estás?"))      # → "ola komo estajs"
print(pycotovia.phonemize("guerra", lang="gl"))      # → "gerra"

# Spanish
print(pycotovia.phonemize("México", lang="es"))      # → "meksiko"
print(pycotovia.phonemize("México", lang="gl"))      # → "meSiko"

# IPA mapping
print(pycotovia.cotovia_to_ipa("gerra"))              # → "ɣɛra"

CLI

# Galician
echo "Ola mundo" | pycotovia

# Spanish
echo "Hola mundo" | pycotovia -l es

# From file
cat words.txt | pycotovia -l gl > phonemes.txt

Differences from the Cotovia binary

Aspect pycotovia Cotovia C binary
Timbre (open/closed e/o) Not applied in transcription mode Same — only used for voice-building
Stress in bui, fui, cuido Correctly shifts to u (buj, fuj, kujDo) Bug: keeps stress on i (bwi, fwi, kwiDo) due to a precedence error in aguda() / grave()

See docs/parity.md for the full parity test results and the deliberate divergences.

Documentation

Examples

See the examples/ directory for:

  • basic_usage.py — single words, phrases, and IPA
  • spanish_usage.py — Spanish-specific examples
  • phrase_processing.py — batch processing from a file

License

Apache-2.0 — this is a clean-room reimplementation in Python, not a derivative of the C++ source.

Acknowledgements

This is a clean-room port of the Cotovia G2P subsystem (transcription rules, syllabification, stress assignment, and exception lists) from C++ to Python. Cotovia was developed by the Multimedia Technologies Group at the University of Vigo and the Centro Ramón Piñeiro for Research in Humanities.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycotovia-0.1.0.tar.gz (32.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pycotovia-0.1.0-py3-none-any.whl (25.1 kB view details)

Uploaded Python 3

File details

Details for the file pycotovia-0.1.0.tar.gz.

File metadata

  • Download URL: pycotovia-0.1.0.tar.gz
  • Upload date:
  • Size: 32.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pycotovia-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b22ca4bf8704c0d59ec8ffae217fb8c7e62d7616aa6093cbde6b1ab20c4d1c3f
MD5 c5cb452f9d7e298fe9f2439c32be2ec4
BLAKE2b-256 64369b83213edea54d45ad5ff6183db1cef6a881f51152e0b3521576e7f87352

See more details on using hashes here.

File details

Details for the file pycotovia-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pycotovia-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 25.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pycotovia-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e60b8426a5a73e0c84b137256dad71ffed0db4d5e72407fbd0689199cddcc882
MD5 cf277d6a5878eee15f5cc1ff364c1b8b
BLAKE2b-256 7e6c71fb0c8da1d30ed5ca000d426c1311bf5db85b34d0c1c5fdfa31b9fa6a21

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page