Transcribe Norwegian dialects phonemically from bokmål text.
Project description
Grapheme to Phoneme models for Norwegian Bokmål
This repo contains code to run G2P models for Norwegian bokmål[^1], which produce phonemic transcriptions for close-to-spoken pronunciations (such as in spontaneous conversations: spoken) and close-to-written pronunciations (such as when reading text aloud: written) for 5 different dialect areas:
- East Norwegian (
e) - South West Norwegian (
sw) - West Norwegian (
w) - Central Norwegian (Trøndersk) (
t) - North Norwegian (
n)
[^1]: Bokmål is the most widely used written standard for Norwegian. The other written standard is Nynorsk. Read more on Wikipedia.
Setup
pip install nb_g2p
Usage
>>> import nb_g2p
>>> list(nb_g2p.transcribe("hei på deg!"))
[('hei', 'H AEJ1'), ('på', 'P OAH0'), ('deg', 'D AEJ1')]
Transcription standard
The G2P models have been trained on the NoFAbet transcription standard which is easier to read by humans than X-SAMPA. NoFAbet is in part based on 2-letter ARPAbet and is made by Nate Young for the National Library of Norway in connection with the development of NoFA, a forced aligner for Norwegian. The equivalence table below contains X-SAMPA, IPA and NoFAbet notatations.
X-SAMPA-IPA-NoFAbet equivalence table
| X-SAMPA | IPA | NoFAbet | Example |
|---|---|---|---|
| A: | ɑː | AA0 | bad |
| {: | æː | AE0 | vær |
| { | æ | AEH0 | vært |
| {*I | æɪ | AEJ0 | sei |
| E*u0 | æʉ | AEW0 | sau |
| A | ɑ | AH0 | hatt |
| A*I | ɑɪ | AJ0 | kai |
| @ | ə | AX0 | behage |
| b | b | B | bil |
| d | d | D | dag |
| e: | eː | EE0 | lek |
| E | ɛ | EH0 | penn |
| f | f | F | fin |
| g | g | G | gul |
| h | h | H | hes |
| I | ɪ | IH0 | sitt |
| i: | iː | II0 | vin |
| j | j | J | ja |
| k | k | K | kost |
| C | ç | KJ | kino |
| l | l | L | land |
| l= | l̩ | LX0 | |
| m | m | M | man |
| n | n | N | nord |
| N | ŋ | NG | eng |
| n= | n̩ | NX0 | |
| o: | oː | OA0 | rå |
| O | ɔ | OAH0 | gått |
| 2: | øː | OE0 | løk |
| 9 | œ | OEH0 | høst |
| 9*Y | œy | OEJ0 | køye |
| U | u | OH0 | f*ort |
| O*Y | ɔy | OJ0 | konvoy |
| u: | uː | OO0 | bod |
| @U | oʉ | OU0 | show |
| p | p | P | pil |
| r | r | R | rose |
| d` | ɖ | RD | rekord |
| l` | ɭ | RL | perle |
| l`= | ɭ̩ | RLX0 | |
| n` | ɳ | RN | barn |
| n`= | ɳ̩ | RNX0 | |
| s` | ʂ | RS | pers |
| t` | ʈ | RT | stort |
| s | s | S | sil |
| S | ʃ | SJ | sju |
| t | t | T | tid |
| u0 | ʉ | UH0 | russ |
| }: | ʉː | UU0 | hus |
| v | ʋ | V | vase |
| w | w | W | Washington |
| Y | y | YH0 | nytt |
| y: | yː | YY0 | ny |
Unstressed syllables are marked with a 0 after the vowel or syllabic consonant. The nucleus is marked with a 1 for tone 1 and a 2 for tone 2. Secondary stress is marked with 3.
License
These models are shared with a Creative_Commons-ZERO (CC-ZERO) license, and so are the lexica they are trained on. The models can be used for any purpose, as long as it is compliant with Phonetisaurus' license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nb_g2p-2.1.1.tar.gz.
File metadata
- Download URL: nb_g2p-2.1.1.tar.gz
- Upload date:
- Size: 13.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: pdm/2.26.0 CPython/3.12.3 Linux/6.8.0-85-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5721f515a76e079a4c298852681b1d7d5876168809aa80ce0f7521589f07099a
|
|
| MD5 |
83a4c1883130aca9744318d88e08810a
|
|
| BLAKE2b-256 |
d4a72c50635d9de42f5793ff94c0b2f7fc85168214cf3ce7cc18f9c0ff70801f
|
File details
Details for the file nb_g2p-2.1.1-py3-none-any.whl.
File metadata
- Download URL: nb_g2p-2.1.1-py3-none-any.whl
- Upload date:
- Size: 11.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: pdm/2.26.0 CPython/3.12.3 Linux/6.8.0-85-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
afdde1ff9be3465d7ece99497a04a6838554ed4e084e487ca9f4efc880fe813d
|
|
| MD5 |
1a6e31563f9afea9d4563907251adcd3
|
|
| BLAKE2b-256 |
b402e33f710e2a2b0f77df1b407dcc5b1236c90d05901b63d120b173bddffb0b
|