Skip to main content

Conversion graphème-phonème du français — G2P + POS + Morpho + Liaison (BiLSTM multi-tâche)

Project description

Lectura G2P

Modele unifie G2P + POS + Morphologie + Liaison pour le francais

Un seul modele BiLSTM char-level multi-tete (1.75M parametres) qui predit simultanement :

  • G2P : transcription phonemique IPA (98.5% word accuracy)
  • POS : etiquetage morpho-syntaxique — 19 tags (98.2% accuracy)
  • Morphologie : genre, nombre, temps, mode, personne, forme verbale (95-99%)
  • Liaison : prediction des liaisons obligatoires/facultatives (F1 90.6%)

Quatre backends d'inference : API (zero config), ONNX Runtime, NumPy, ou pur Python (zero dependance).

Demarrage rapide

pip install lectura-g2p
from lectura_nlp import creer_engine

engine = creer_engine()    # mode API par defaut (zero config)

result = engine.analyser(["Les", "enfants", "sont", "arrives", "a", "la", "maison"])

print(result["g2p"])      # ['le', 'ɑ̃fɑ̃', 'sɔ̃', 'aʁive', 'a', 'la', 'mɛzɔ̃']
print(result["pos"])      # ['ART:def', 'NOM', 'AUX', 'VER', 'PRE', 'ART:def', 'NOM']
print(result["liaison"])  # ['Lz', 'none', 'Lt', 'none', 'none', 'none', 'none']
print(result["morpho"])   # {'Number': ['Plur', ...], 'Gender': [...], ...}

Backends d'inference

Backend Dependances Vitesse Usage
API aucune ~100 ms (reseau) Par defaut, zero config
ONNX Runtime onnxruntime ~2 ms/phrase Production locale
NumPy numpy ~50 ms/phrase Leger
Pur Python aucune ~200 ms/phrase Embarque, portabilite max
# Forcer un backend specifique
engine = creer_engine(mode="onnx")    # ONNX local
engine = creer_engine(mode="api")     # API serveur
engine = creer_engine(mode="auto")    # local si modeles presents, sinon API

Les backends locaux (ONNX, NumPy, Pure) necessitent les modeles — disponibles sur demande.

Benchmarks (test set)

Tache Metrique Score
G2P Word Accuracy 98.5%
G2P PER (Phone Error Rate) 0.5%
POS Accuracy 98.2%
Liaison Macro F1 90.6%
Morpho — Number Accuracy 97.0%
Morpho — Gender Accuracy 95.1%
Morpho — VerbForm Accuracy 98.8%

API

creer_engine(mode="auto") -> engine

Factory pour creer un engine d'inference. Modes : "auto", "api", "local", "onnx", "numpy", "pure".

engine.analyser(tokens) -> dict

Analyse une liste de tokens et retourne :

  • g2p : transcription IPA par token
  • pos : etiquette POS par token
  • liaison : label liaison par token (none, Lz, Lt, Ln, Lr, Lp)
  • morpho : dict de listes par trait (Number, Gender, VerbForm, Mood, Tense, Person)

Licence

Double licence :

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lectura_g2p-2.0.1-py3-none-any.whl (47.7 kB view details)

Uploaded Python 3

File details

Details for the file lectura_g2p-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: lectura_g2p-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 47.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for lectura_g2p-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 16db4af9d2cf725143dbafaae5b106f3b1ec911bd241f194217f0ee504f57cee
MD5 973a73c6f031a01a2608d6b3f38a4126
BLAKE2b-256 a0a7730365e24355d6040ba08cc797f3c0f6f7932f8c366b5f712adb3a9e79b8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page