Geographically-informed language identification
Project description
geoLid
Geographically-informed language identification
This Python package carries out language identification with geographic priors to increase performance for low-resource and under-represented languages.
A description and evaluation of this approach can be found here: https://jdunn.name/2024/03/13/geographically-informed-language-identification/
A complete list of language codes and names per regional model can be found in the language_names directory.
Downloading models
geoLid contains a baseline non-geographic model as well as models for 16 specific regions, as shown below:
baseline (916 languages)
africa_north (44 languages)
africa_southern (58 languages)
africa_sub (166 languages)
america_brazil (88 languages)
america_central (188 languages)
america_north (68 languages)
america_south (129 languages)
asia_central (54 languages)
asia_east (46 languages)
asia_south (60 languages)
asia_southeast (325 languages)
europe_east (65 languages)
europe_russia (65 languages)
europe_west (108 languages)
middle_east (53 languages)
oceania (49 languages)
To download models, use this command:
from geoLid import download_model
download_model("baseline")
The model name "all" will download all region-specific models.
Usage
Language identification can be used as shown below:
from geoLid import geoLid
lid = geoLid(model_location = "models")
labels = lid.predict(data = data, region = "baseline")
The model_location during initialization points to the directory containing the LID models.
The input variable data is a list containing at least one string that represents a text to make predictions about.
The region variable indicates which region-specific model should be used. The default is to use the non-geographic baseline model.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file geoLid-1.0.tar.gz
.
File metadata
- Download URL: geoLid-1.0.tar.gz
- Upload date:
- Size: 16.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 40e4ef3a4ee2df6482db3ed883931e9338a3e8014c7374e4191324e4dc49e002 |
|
MD5 | 85cba871c29a29f2c60e4e11d6929b88 |
|
BLAKE2b-256 | 6b6dbc009965a0dde8be84b41bdf83774d7835f207997325412a927e4f7516be |
Provenance
File details
Details for the file geoLid-1.0-py3-none-any.whl
.
File metadata
- Download URL: geoLid-1.0-py3-none-any.whl
- Upload date:
- Size: 16.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a5d43ab29f4c11d7884e03f337b4fe520b5d351bd298769023352408d6dd5c4c |
|
MD5 | 18b81bfaf5731ac222b75bb111d5bad0 |
|
BLAKE2b-256 | 7a4f8660f683f89d4d158e666f2178d9c01bf731100834d572b9bed7f31c2775 |