Skip to main content

Friulian text normalization, tokenization, G2P conversion and phonology with a CLI and pipeline service.

Project description

FurlanG2P

Utilities for converting Friulian (Furlan) text to phonemes. The package includes a tiny gold lexicon with variant transcriptions, a dialect-aware letter‑to‑sound rule engine, a configurable normalization routine, a sentence/word tokenizer, a syllabifier with basic phonotactics, a stress assigner aware of long vowels and accent marks, and an IPA canonicalizer that together provide a furlang2p command-line tool. The normalizer spells out numbers up to 999 999 999 999 and can expand units, abbreviations and acronyms, with rules loaded from JSON or YAML files, while the tokenizer can skip sentence splits after configurable abbreviations. The CLI also offers subcommands to normalize text, output phoneme sequences and batch phonemize metadata CSV files.

Installation

pip install furlang2p

CLI usage

Phonemise short phrases using the ipa subcommand:

furlang2p ipa "ìsule glace"
# -> ˈizule ˈglatʃe

Wrap tokens in slashes or force rule-only conversion:

furlang2p ipa --with-slashes "glaç"
# -> /ˈglatʃ/

furlang2p ipa --rules-only "glaç"
# -> glatʃ

Use underscores as pause markers and customise the token separator:

furlang2p ipa --sep '|' _ "ìsule" __
# -> _|ˈizule|__

Other available subcommands:

  • Normalize and expand numbers/abbreviations:

    furlang2p normalize "CJASE 1964 kg"
    # -> cjase mil nûfcent e sessantecuatri chilogram
    
  • Convert a phrase to phonemes:

    furlang2p g2p "Cjase"
    # -> ˈc a z e
    
  • Phonemize a metadata CSV:

    furlang2p phonemize-csv --in metadata.csv --out out.csv
    

The repository also ships a convenience script providing the same batch conversion:

python scripts/generate_phonemes.py --in metadata.csv --out out.csv

All subcommands validate inputs and emit clear error messages for missing files or conflicting arguments.

Python usage

Invoke the full pipeline programmatically:

from furlan_g2p.services import PipelineService

pipe = PipelineService()
norm, phonemes = pipe.process_text("Cjase")
print(norm)                   # cjase
print(" ".join(phonemes))     # ˈc a z e

Normalisation rules can be customised via external JSON or YAML files:

from furlan_g2p.config import load_normalizer_config
from furlan_g2p.normalization import Normalizer

cfg = load_normalizer_config("norm_rules.yml")
print(Normalizer(cfg).normalize("1964 kg"))
# -> mil nûfcent e sessantecuatri chilogram

Lower-level components such as the lexicon and rule engine remain available for fine-grained control.

Project links

License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

furlang2p-0.1.0.tar.gz (40.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

furlang2p-0.1.0-py3-none-any.whl (38.2 kB view details)

Uploaded Python 3

File details

Details for the file furlang2p-0.1.0.tar.gz.

File metadata

  • Download URL: furlang2p-0.1.0.tar.gz
  • Upload date:
  • Size: 40.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for furlang2p-0.1.0.tar.gz
Algorithm Hash digest
SHA256 61680942b045d15c9f53355c3af24a9429e1b3601394df1917a375c353db76ee
MD5 52dcee8b2a38d706dbeec84e1de4c650
BLAKE2b-256 b6c1ac7716c848be3a9483f3b01d39c66b47115efd30f0f5fe77a359fd4e020c

See more details on using hashes here.

Provenance

The following attestation bundles were made for furlang2p-0.1.0.tar.gz:

Publisher: release.yml on daurmax/FurlanG2P

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file furlang2p-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: furlang2p-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 38.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for furlang2p-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1e2b87991482d4152029798b2ac8b92d50ba512c3e481352b40ba914d94ffa72
MD5 49672bc96725abac758fc4976099ea29
BLAKE2b-256 bb9d9c1bbb538dfb0b1c0b7d263917084b3161cdbf707811ee7c13639b2ce559

See more details on using hashes here.

Provenance

The following attestation bundles were made for furlang2p-0.1.0-py3-none-any.whl:

Publisher: release.yml on daurmax/FurlanG2P

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page