Quickly detect text language and segment language
Project description
fast-langdetect 🚀
Python 3.9-3.12 support only. 🐍
80x faster and 95% accurate language identification with Fasttext 🏎️
This library is a wrapper for the language detection model trained on fasttext by Facebook. For more information, please visit: https://fasttext.cc/docs/en/language-identification.html 📘
This repository is patched from zafercavdar/fasttext-langdetect, adding multi-language segmentation and better packaging support. 🌐
Facilitates more accurate TTS implementation. 🗣️
Need 200M+ memory to use low_memory mode 💾
Installation 💻
pip install fast-langdetect
Usage 🖥️
For more accurate language detection, please use detect(text,low_memory=False)
to load the big model.
Model will be downloaded in /tmp/fasttext-langdetect
directory when you first use it.
from fast_langdetect import detect_langs
print(detect_langs("Hello, world!"))
# EN
print(detect_langs("Привет, мир!"))
# RU
print(detect_langs("你好,世界!"))
# ZH
Advanced usage 🚀
from fast_langdetect import detect, parse_sentence, detect_multilingual
print(detect("Hello, world!"))
# {'lang': 'en', 'score': 0.1520957201719284}
print(detect_multilingual("Hello, world!你好世界!Привет, мир!"))
# [{'lang': 'ru', 'score': 0.39008623361587524}, {'lang': 'zh', 'score': 0.18235979974269867}, {'lang': 'ja', 'score': 0.08473210036754608}, {'lang': 'sr', 'score': 0.057975586503744125}, {'lang': 'en', 'score': 0.05422825738787651}]
print(parse_sentence("你好世界!Hello, world!Привет, мир!"))
# [{'text': '你好世界!Hello, world!', 'lang': 'ZH', 'length': 18}, {'text': 'Привет, мир!', 'lang': 'UK', 'length': 12}, {'text': '', 'lang': 'EN', 'length': 0}]
Accuracy 🎯
References to the benchmark
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for fast_langdetect-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a845f03efe1a6e50264d1ae53152b8e63949ea1a6b85b71ea6840e5b008d2a22 |
|
MD5 | 53fecd80b6e19035aba26538fc435956 |
|
BLAKE2b-256 | dcbd2b5c9a5fd662482767eb9805993147f25d60ae5b5eb537c83ff400a22df3 |