Skip to main content

The amazing Murre will normalize non-standard Finnish and Swedish, and dialectalize standard Finnish!

Project description

🐶 Murre 🐕

Downloads

The amazing Murre (genitive Murren 🐕) will normalize non-standard Finnish (puhekieli) to standard Finnish (kirjakieli). This repository is maintained by Mika Hämäläinen.

Installation

This library is designed for Python 3 and it may not work on Python 2.

pip3 install murre
python3 -m murre.download

Normalize

To normalize Finnish, all you need to do is to run:

from murre import normalize_sentence

normalize_sentence("mä syön paljo karkkii")
>> minä syön paljon karkkia

You can normalize multiple sentences at the same time by running

from murre import normalize_sentences

sents = ["kissa syö karkkii", "jok laulaa tuol puole", "en tiiä oikee et kuka se o", "kyl on hölömöö"]
normalize_sentences(sents)
>> ['kissa syö karkkia', 'joka laulaa tuolla puolen', 'en tiedä oikein että kuka se on', 'kyllä on hölmöä']

Historical Finnish

To normalize (and lemmatize) historical Finnish, run:

from murre import normalize_sentence

normalize_sentence("paluellen herra caiken", language="fin_hist")
>> palvella herra kaikki

Swedish

You can use the Swedish model by passing language=swe

from murre import normalize_sentence

normalize_sentence("int vet ja", language="swe")
>> inte vet jag

Generate

Murre can also generate different dialects. All you need to do, is to run:

from murre import dialectalize_sentence
dialectalize_sentence("kodin takana on koira", "Inkerinsuomalaismurteet")
>> 'kojin takan on koira'

Or for multiple sentences:

from murre import dialectalize_sentences
sents = ["kissa syö karkkia", "kädellä on perhonen", "kettu juoksee sutta karkuun"]
dialectalize_sentences(sents,'Kainuu')
>> ['kissa syöpi karkkia', 'käellä om perhonej', 'kettu juoksee sutta karkuu']

The list of available dialects can be obtained by:

from murre import supported_dialects
supported_dialects()
>> ['Pohjois-Satakunta', 'Keski-Karjala', 'Kainuu', 'Etelä-Pohjanmaa', 'Etelä-Satakunta', 'Pohjois-Savo', 'Pohjois-Karjala', 'Keski-Pohjanmaa', 'Kaakkois-Häme', 'PohjoinenKeski-Suomi', 'Pohjois-Pohjanmaa', 'PohjoinenVarsinais-Suomi', 'Etelä-Karjala', 'Länsi-Uusimaa', 'Inkerinsuomalaismurteet', 'LäntinenKeski-Suomi', 'Länsi-Satakunta', 'Etelä-Savo', 'Länsipohja', 'Pohjois-Häme', 'EteläinenKeski-Suomi', 'Etelä-Häme', 'Peräpohjola']

Cite

Normalization (Finnish)

Niko Partanen, Mika Hämäläinen, and Khalid Alnajjar. (2019). Dialect Text Normalization to Normative Standard Finnish. In the Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT).

Normalization (Swedish)

Mika Hämäläinen, Niko Partanen and Khalid Alnajjar. (2020). Normalization of Different Swedish Dialects Spoken in Finland. In the Proceedings of the 4th ACM SIGSPATIAL Workshop on Geospatial Humanities.

Dialect generation

Mika Hämäläinen, Niko Partanen, Khalid Alnajjar, Jack Rueter & Thierry Poibeau (2020). Automatic Dialect Adaptation in Finnish and its Effect on Perceived Creativity. In Proceedings of the 11th International Conference on Computational Creativity. p. 204-211

Historical Finnish

Mika Hämäläinen, Niko Partanen and Khalid Alnajjar. (2021). Lemmatization of Historical Old Literary Finnish Texts in Modern Orthography. In Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN).

Data

The data used in the paper describing dialect generation has been published on Zenodo DOI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

murre-1.4.1.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

murre-1.4.1-py2.py3-none-any.whl (10.3 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file murre-1.4.1.tar.gz.

File metadata

  • Download URL: murre-1.4.1.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.3

File hashes

Hashes for murre-1.4.1.tar.gz
Algorithm Hash digest
SHA256 d0b4c3041522e3f7abcdae9513a28f130e7f30eaac2f48f984783759c17c05cb
MD5 66318798de617eaa119b0a9cb517d0cb
BLAKE2b-256 93ac07548ded65a591c175072bed592e955f3d66bed4df4322ee504a4d19ad80

See more details on using hashes here.

File details

Details for the file murre-1.4.1-py2.py3-none-any.whl.

File metadata

  • Download URL: murre-1.4.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 10.3 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.3

File hashes

Hashes for murre-1.4.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 15fc0a786f0f5f01b28da95a94bbab479b3e0379ae243f0000a9cac7c2a9b527
MD5 e40d60015b5c01bd18fd2aa027597c2b
BLAKE2b-256 09b4e232c1915b000ee69f9fb17b02272deeeeade02d4b62a89733abb43eabec

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page