Language detection module based on the GiellaLT models, specifically aimed at minority and indigenous languages
Project description
Makes the language classification script from the GiellaLT's corpus tools available as a python module (GiellaLT's website, original repo).
The source code as well as the language model files are released under the GPL-3.0 license.
Installation
pip install gielladetect
Usage
import gielladetect
text = "Lurer du på hva som rører seg innenfor veggene til Nasjonalbiblioteket på Solli plass i Oslo?"
gielladetect.detect(text)
# Result: 'nob'
# To restrict detection to a subset of languages:
gielladetect.detect(text, ['nob', 'nno', 'eng'])
# Result: 'nob'
Supported languages
Using ISO 639-3 codes.
Code | Name |
---|---|
ara | Arabic |
bxr | Russia Buriat |
ckb | Central Kurdish |
dan | Danish |
deu | German |
eng | English |
est | Estonian |
fao | Faroese |
fas | Persian |
fin | Finnish |
fit | Tornedalen Finnish |
fkv | Kven Finnish |
fra | French |
hbs | Serbo-Croatian |
isl | Icelandic |
ita | Italian |
kal | Kalaallisut |
kmr | Northern Kurdish |
koi | Komi-Permyak |
kpv | Komi-Zyrian |
krl | Karelian |
mdf | Moksha |
mhr | Eastern Mari |
mns | Mansi |
mrj | Western Mari |
myv | Erzya |
nno | Norwegian Nynorsk |
nob | Norwegian Bokmål |
olo | Livvi |
pol | Polish |
rmf | Kalo Finnish Romani |
rmn | Balkan Romani |
rmu | Tavringer Romani |
rmy | Vlax Romani |
ron | Romanian |
rus | Russian |
sma | Southern Sami |
sme | Northern Sami |
smj | Lule Sami |
smn | Inari Sami |
sms | Skolt Sami |
som | Somali |
spa | Spanish |
swe | Swedish |
tur | Turkish |
udm | Udmurt |
urd | Urdu |
vep | Veps |
vie | Vietnamese |
yid | Yiddish |
yrk | Nenets |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
gielladetect-1.0.1.tar.gz
(3.9 MB
view hashes)
Built Distribution
Close
Hashes for gielladetect-1.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | caec50e3354b20bd5314638443d975052463282121620379c6db45f4f879bbe6 |
|
MD5 | bc19d608a161d3538c81071fdaa29956 |
|
BLAKE2b-256 | ca82496f585ca5b2ad219239a1bd6886d5f204b2758110f8857128441d67373c |