Skip to main content

Conversion phonème-graphème du français (IPA → orthographe) — P2G + POS + Morpho (BiLSTM multi-tâche)

Project description

Lectura P2G

Modele unifie P2G + POS + Morphologie pour le francais (IPA → orthographe)

Un seul modele BiLSTM char-level multi-tete avec word feedback (2.56M parametres) qui predit simultanement :

  • P2G : transcription IPA vers orthographe (93.1% word accuracy, 2.2% CER)
  • POS : etiquetage morpho-syntaxique — 19 tags (97.0% accuracy)
  • Morphologie : genre, nombre, temps, mode, personne, forme verbale (92-97%)

Quatre backends d'inference : API (zero config), ONNX Runtime, NumPy, ou pur Python (zero dependance).

Demarrage rapide

pip install lectura-p2g
from lectura_p2g import creer_engine

engine = creer_engine()    # mode API par defaut (zero config)

result = engine.analyser(["le", "ɑ̃fɑ̃", "sɔ̃", "aʁive", "a", "la", "mɛzɔ̃"])

print(result["ortho"])   # ['les', 'enfants', 'sont', 'arrives', 'a', 'la', 'maison']
print(result["pos"])     # ['ART:def', 'NOM', 'AUX', 'VER', 'PRE', 'ART:def', 'NOM']

Backends d'inference

Backend Dependances Vitesse Usage
API aucune ~100 ms (reseau) Par defaut, zero config
ONNX Runtime onnxruntime ~2 ms/phrase Production locale
NumPy numpy ~50 ms/phrase Leger
Pur Python aucune ~200 ms/phrase Embarque, portabilite max
engine = creer_engine(mode="onnx")    # ONNX local
engine = creer_engine(mode="api")     # API serveur
engine = creer_engine(mode="auto")    # local si modeles presents, sinon API

Les backends locaux (ONNX, NumPy, Pure) necessitent les modeles — disponibles sur demande.

Benchmarks (test set)

Tache Metrique Score
P2G Word Accuracy 93.1%
P2G CER (Character Error Rate) 2.2%
POS Accuracy 97.0%
Morpho — Number Accuracy 92.8%
Morpho — Gender Accuracy 92.0%

API

creer_engine(mode="auto") -> engine

Factory pour creer un engine d'inference. Modes : "auto", "api", "local", "onnx", "numpy", "pure".

engine.analyser(ipa_words) -> dict

Analyse une liste de mots IPA et retourne :

  • ortho : orthographe reconstruite par mot
  • pos : etiquette POS par mot
  • morpho : dict de listes par trait (Number, Gender, VerbForm, Mood, Tense, Person)

Licence

Double licence :

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lectura_p2g-2.0.1-py3-none-any.whl (46.9 kB view details)

Uploaded Python 3

File details

Details for the file lectura_p2g-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: lectura_p2g-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 46.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for lectura_p2g-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 55046e0b2a6f43d8fbd68f41f37e9917b145383d7e6e4d4e7f91c41695da7b8a
MD5 2fc7139b27759fc5df27b748ecccec88
BLAKE2b-256 04d79515e9f1a195d8a62b745a9b2aab370ef9733d68f19dec4c399c16bd8e21

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page