Language detection module based on the GiellaLT models, specifically aimed at minority and indigenous languages
Project description
Makes the language classification script from the GiellaLT's corpus tools available as a python module (GiellaLT's website, original repo).
The source code as well as the language model files are released under the GPL-3.0 license.
Installation
pip install gielladetect
Usage
import gielladetect
text = "Lurer du på hva som rører seg innenfor veggene til Nasjonalbiblioteket på Solli plass i Oslo?"
gielladetect.detect(text)
# Result: 'nob'
# To restrict detection to a subset of languages:
gielladetect.detect(text, ['nob', 'nno', 'eng'])
# Result: 'nob'
Supported languages
Using ISO 639-3 codes.
Code | Name |
---|---|
ara | Arabic |
bxr | Russia Buriat |
ckb | Central Kurdish |
dan | Danish |
deu | German |
eng | English |
est | Estonian |
fao | Faroese |
fas | Persian |
fin | Finnish |
fit | Tornedalen Finnish |
fkv | Kven Finnish |
fra | French |
hbs | Serbo-Croatian |
isl | Icelandic |
ita | Italian |
kal | Kalaallisut |
kmr | Northern Kurdish |
koi | Komi-Permyak |
kpv | Komi-Zyrian |
krl | Karelian |
mdf | Moksha |
mhr | Eastern Mari |
mns | Mansi |
mrj | Western Mari |
myv | Erzya |
nno | Norwegian Nynorsk |
nob | Norwegian Bokmål |
olo | Livvi |
pol | Polish |
rmf | Kalo Finnish Romani |
rmn | Balkan Romani |
rmu | Tavringer Romani |
rmy | Vlax Romani |
ron | Romanian |
rus | Russian |
sma | Southern Sami |
sme | Northern Sami |
smj | Lule Sami |
smn | Inari Sami |
sms | Skolt Sami |
som | Somali |
spa | Spanish |
swe | Swedish |
tur | Turkish |
udm | Udmurt |
urd | Urdu |
vep | Veps |
vie | Vietnamese |
yid | Yiddish |
yrk | Nenets |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
gielladetect-1.0.3.tar.gz
(3.9 MB
view hashes)
Built Distribution
Close
Hashes for gielladetect-1.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 773550191ff473cb2aa613e275bada91092b12318b6b397b32dad910f73b0c0e |
|
MD5 | bfe5058f749f39e64b21150d39378c20 |
|
BLAKE2b-256 | 0a667521a51e291b10d75bd81c1b471735c8830f985a8c5b3710176e3612c3dc |