A language detector for short string or chat.
Project description
LanguageDetection
Language detector used by Interaction Bot.
Method
We made detection by dictionary and with ngram based method with fastText. We also made a language priority (made by counting the number of detected language by Interaction Bot in 24 hours). With discord, you could also use the language of the discord user interface send by the api to determine if a input is reliable (when reliabe is false).
Usage
Run the main.py file (you must dl the fasttext model with this command:
wget https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin # put it to model
import detector
def detect(text):
pred = detector.detect(text)
return pred[0], True if pred[1] > 0.4 else False
print(detect('Hi'))
Output
Language code | reliability
Futur
We will add a command to report bad detection, so we will update the dictionary detection accordingly. (we also update the code because it is bad actually 😂).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ShortLanguageDetection-0.0.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4d9a4c7b0026a41124ed61b76b7ff469a11b1cdb6c0ff73c5371220efd1f3fbe |
|
MD5 | 79ead8ef330e3e374dde85bb1d7d2bda |
|
BLAKE2b-256 | 3c73e9e5fe2aefef44700e927cef5154b2c76fb08c822c90477db745e8835023 |
Hashes for ShortLanguageDetection-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9de83ca72e416915177909f9f3ad527557f9d18de45c228199d245a6a5c5c505 |
|
MD5 | 0d3a634cf2c00a89b086d7da25824f1f |
|
BLAKE2b-256 | f261fbc22e1a6d5305fc58d75d8e17851dc61badd870811bfc4e2adcf89e2f27 |