Skip to main content

Transcribe Norwegian dialects phonemically from bokmål text.

Project description

Grapheme to Phoneme models for Norwegian Bokmål

lang-button lang-button

This repo contains code to run G2P models for Norwegian bokmål[^1], which produce phonemic transcriptions for close-to-spoken pronunciations (such as in spontaneous conversations: spoken) and close-to-written pronunciations (such as when reading text aloud: written) for 5 different dialect areas:

  1. East Norwegian (e)
  2. South West Norwegian (sw)
  3. West Norwegian (w)
  4. Central Norwegian (Trøndersk) (t)
  5. North Norwegian (n)

[^1]: Bokmål is the most widely used written standard for Norwegian. The other written standard is Nynorsk. Read more on Wikipedia.

Setup

pip install nb_g2p

Usage

>>> import nb_g2p
>>> list(nb_g2p.transcribe("hei på deg!"))
[('hei', 'H AEJ1'), ('på', 'P OAH0'), ('deg', 'D AEJ1')]

Transcription standard

The G2P models have been trained on the NoFAbet transcription standard which is easier to read by humans than X-SAMPA. NoFAbet is in part based on 2-letter ARPAbet and is made by Nate Young for the National Library of Norway in connection with the development of NoFA, a forced aligner for Norwegian. The equivalence table below contains X-SAMPA, IPA and NoFAbet notatations.

X-SAMPA-IPA-NoFAbet equivalence table

X-SAMPA IPA NoFAbet Example
A: ɑː AA0 bad
{: æː AE0 vær
{ æ AEH0 vært
{*I æɪ AEJ0 sei
E*u0 æʉ AEW0 sau
A ɑ AH0 hatt
A*I ɑɪ AJ0 kai
@ ə AX0 behage
b b B bil
d d D dag
e: EE0 lek
E ɛ EH0 penn
f f F fin
g g G gul
h h H hes
I ɪ IH0 sitt
i: II0 vin
j j J ja
k k K kost
C ç KJ kino
l l L land
l= LX0
m m M man
n n N nord
N ŋ NG eng
n= NX0
o: OA0 rå
O ɔ OAH0 gått
2: øː OE0 løk
9 œ OEH0 høst
9*Y œy OEJ0 køye
U u OH0 f*ort
O*Y ɔy OJ0 konvoy
u: OO0 bod
@U OU0 show
p p P pil
r r R rose
d` ɖ RD rekord
l` ɭ RL perle
l`= ɭ̩ RLX0
n` ɳ RN barn
n`= ɳ̩ RNX0
s` ʂ RS pers
t` ʈ RT stort
s s S sil
S ʃ SJ sju
t t T tid
u0 ʉ UH0 russ
}: ʉː UU0 hus
v ʋ V vase
w w W Washington
Y y YH0 nytt
y: YY0 ny

Unstressed syllables are marked with a 0 after the vowel or syllabic consonant. The nucleus is marked with a 1 for tone 1 and a 2 for tone 2. Secondary stress is marked with 3.

License

These models are shared with a Creative_Commons-ZERO (CC-ZERO) license, and so are the lexica they are trained on. The models can be used for any purpose, as long as it is compliant with Phonetisaurus' license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nb_g2p-2.1.1.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nb_g2p-2.1.1-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file nb_g2p-2.1.1.tar.gz.

File metadata

  • Download URL: nb_g2p-2.1.1.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.26.0 CPython/3.12.3 Linux/6.8.0-85-generic

File hashes

Hashes for nb_g2p-2.1.1.tar.gz
Algorithm Hash digest
SHA256 5721f515a76e079a4c298852681b1d7d5876168809aa80ce0f7521589f07099a
MD5 83a4c1883130aca9744318d88e08810a
BLAKE2b-256 d4a72c50635d9de42f5793ff94c0b2f7fc85168214cf3ce7cc18f9c0ff70801f

See more details on using hashes here.

File details

Details for the file nb_g2p-2.1.1-py3-none-any.whl.

File metadata

  • Download URL: nb_g2p-2.1.1-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.26.0 CPython/3.12.3 Linux/6.8.0-85-generic

File hashes

Hashes for nb_g2p-2.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 afdde1ff9be3465d7ece99497a04a6838554ed4e084e487ca9f4efc880fe813d
MD5 1a6e31563f9afea9d4563907251adcd3
BLAKE2b-256 b402e33f710e2a2b0f77df1b407dcc5b1236c90d05901b63d120b173bddffb0b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page