Quickly detect text language and segment language
Project description
fast-langdetect
Python 3.8-3.11 support only.
80x faster and 95% accurate language identification with Fasttext
This library is a wrapper for the language detection model trained on fasttext by Facebook. For more information, please visit: https://fasttext.cc/docs/en/language-identification.html
This repository is patched from zafercavdar/fasttext-langdetect, adding multi-language segmentation and better packaging support.
Facilitates more accurate TTS implementation.
Installation
pip install fast-langdetect
Usage
For more accurate language detection, please use detect(text,low_memory=False)
to load the big model.
Model will be downloaded in /tmp/fasttext-langdetect
directory when you first use it.
from fast_langdetect import detect_langs
print(detect_langs("Hello, world!"))
# [en:0.9999961853027344]
print(detect_langs("Привет, мир!"))
# [ru:0.9999961853027344]
print(detect_langs("你好,世界!"))
# [zh:0.9999961853027344]
Advanced usage
from fast_langdetect import detect, parse_sentence, detect_multilingual
print(detect("Hello, world!"))
# {'lang': 'en', 'score': 0.1520957201719284}
print(detect_multilingual("Hello, world!你好世界!Привет, мир!"))
# [{'lang': 'ru', 'score': 0.39008623361587524}, {'lang': 'zh', 'score': 0.18235979974269867}, {'lang': 'ja', 'score': 0.08473210036754608}, {'lang': 'sr', 'score': 0.057975586503744125}, {'lang': 'en', 'score': 0.05422825738787651}]
print(parse_sentence("你好世界!Hello, world!Привет, мир!"))
# [{'text': '你好世界!Hello, world!', 'lang': 'ZH', 'length': 18}, {'text': 'Привет, мир!', 'lang': 'UK', 'length': 12}, {'text': '', 'lang': 'EN', 'length': 0}]
Accuracy
References to the benchmark
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for fast_langdetect-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b886e1f1b15ca1b69027df652f118fa8a26f30424f3d7cffa823ff1a37155643 |
|
MD5 | 1287d2e8e8f70dae80bad05d42788549 |
|
BLAKE2b-256 | 537d0a878988a4cba88268022bcaa6ffc39da4567d75be3977d2b8fa7faaead5 |