Friulian text normalization, tokenization, G2P conversion and phonology with a CLI and pipeline service.
Project description
FurlanG2P
Utilities for converting Friulian (Furlan) text to phonemes. The package
includes a tiny gold lexicon with variant transcriptions, a dialect-aware
letter‑to‑sound rule engine, a configurable normalization routine, a
sentence/word tokenizer, a syllabifier with basic phonotactics, a stress
assigner aware of long vowels and accent marks, and an IPA canonicalizer that
together provide a furlang2p command-line tool. The normalizer
spells out numbers up to 999 999 999 999 and can expand units, abbreviations and
acronyms, with rules loaded from JSON or YAML files, while the tokenizer can
skip sentence splits after configurable abbreviations. The CLI also offers
subcommands to normalize text, output phoneme sequences and batch phonemize
metadata CSV files.
Installation
pip install furlang2p
CLI usage
Phonemise short phrases using the ipa subcommand:
furlang2p ipa "ìsule glace"
# -> ˈizule ˈglatʃe
Wrap tokens in slashes or force rule-only conversion:
furlang2p ipa --with-slashes "glaç"
# -> /ˈglatʃ/
furlang2p ipa --rules-only "glaç"
# -> glatʃ
Use underscores as pause markers and customise the token separator:
furlang2p ipa --sep '|' _ "ìsule" __
# -> _|ˈizule|__
Other available subcommands:
-
Normalize and expand numbers/abbreviations:
furlang2p normalize "CJASE 1964 kg" # -> cjase mil nûfcent e sessantecuatri chilogram
-
Convert a phrase to phonemes:
furlang2p g2p "Cjase" # -> ˈc a z e
-
Phonemize a metadata CSV:
furlang2p phonemize-csv --in metadata.csv --out out.csv
The repository also ships a convenience script providing the same batch conversion:
python scripts/generate_phonemes.py --in metadata.csv --out out.csv
All subcommands validate inputs and emit clear error messages for missing files or conflicting arguments.
Python usage
Invoke the full pipeline programmatically:
from furlan_g2p.services import PipelineService
pipe = PipelineService()
norm, phonemes = pipe.process_text("Cjase")
print(norm) # cjase
print(" ".join(phonemes)) # ˈc a z e
Normalisation rules can be customised via external JSON or YAML files:
from furlan_g2p.config import load_normalizer_config
from furlan_g2p.normalization import Normalizer
cfg = load_normalizer_config("norm_rules.yml")
print(Normalizer(cfg).normalize("1964 kg"))
# -> mil nûfcent e sessantecuatri chilogram
Lower-level components such as the lexicon and rule engine remain available for fine-grained control.
Project links
- Source code and issue tracker: https://github.com/daurmax/FurlanG2P
- Bibliography and references: https://github.com/daurmax/FurlanG2P/blob/main/docs/references.md
License
This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file furlang2p-0.1.0.tar.gz.
File metadata
- Download URL: furlang2p-0.1.0.tar.gz
- Upload date:
- Size: 40.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
61680942b045d15c9f53355c3af24a9429e1b3601394df1917a375c353db76ee
|
|
| MD5 |
52dcee8b2a38d706dbeec84e1de4c650
|
|
| BLAKE2b-256 |
b6c1ac7716c848be3a9483f3b01d39c66b47115efd30f0f5fe77a359fd4e020c
|
Provenance
The following attestation bundles were made for furlang2p-0.1.0.tar.gz:
Publisher:
release.yml on daurmax/FurlanG2P
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
furlang2p-0.1.0.tar.gz -
Subject digest:
61680942b045d15c9f53355c3af24a9429e1b3601394df1917a375c353db76ee - Sigstore transparency entry: 528695048
- Sigstore integration time:
-
Permalink:
daurmax/FurlanG2P@fd200371f39e8812332a815461ae239f15f55845 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/daurmax
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@fd200371f39e8812332a815461ae239f15f55845 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file furlang2p-0.1.0-py3-none-any.whl.
File metadata
- Download URL: furlang2p-0.1.0-py3-none-any.whl
- Upload date:
- Size: 38.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e2b87991482d4152029798b2ac8b92d50ba512c3e481352b40ba914d94ffa72
|
|
| MD5 |
49672bc96725abac758fc4976099ea29
|
|
| BLAKE2b-256 |
bb9d9c1bbb538dfb0b1c0b7d263917084b3161cdbf707811ee7c13639b2ce559
|
Provenance
The following attestation bundles were made for furlang2p-0.1.0-py3-none-any.whl:
Publisher:
release.yml on daurmax/FurlanG2P
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
furlang2p-0.1.0-py3-none-any.whl -
Subject digest:
1e2b87991482d4152029798b2ac8b92d50ba512c3e481352b40ba914d94ffa72 - Sigstore transparency entry: 528695062
- Sigstore integration time:
-
Permalink:
daurmax/FurlanG2P@fd200371f39e8812332a815461ae239f15f55845 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/daurmax
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@fd200371f39e8812332a815461ae239f15f55845 -
Trigger Event:
workflow_dispatch
-
Statement type: