Skip to main content

Quickly detect text language and segment language

Project description

fast-langdetect

PyPI version Downloads Downloads

Python 3.8-3.11 support only.

80x faster and 95% accurate language identification with Fasttext

This library is a wrapper for the language detection model trained on fasttext by Facebook. For more information, please visit: https://fasttext.cc/docs/en/language-identification.html

This repository is patched from zafercavdar/fasttext-langdetect, adding multi-language segmentation and better packaging support.

Facilitates more accurate TTS implementation.

Installation

pip install fast-langdetect

Usage

For more accurate language detection, please use detect(text,low_memory=False) to load the big model.

Model will be downloaded in /tmp/fasttext-langdetect directory when you first use it.

from fast_langdetect import detect_langs

print(detect_langs("Hello, world!"))
# [en:0.9999961853027344]

print(detect_langs("Привет, мир!"))
# [ru:0.9999961853027344]


print(detect_langs("你好,世界!"))
# [zh:0.9999961853027344]

Advanced usage

from fast_langdetect import detect, parse_sentence, detect_multilingual

print(detect("Hello, world!"))
# {'lang': 'en', 'score': 0.1520957201719284}

print(detect_multilingual("Hello, world!你好世界!Привет, мир!"))
# [{'lang': 'ru', 'score': 0.39008623361587524}, {'lang': 'zh', 'score': 0.18235979974269867}, {'lang': 'ja', 'score': 0.08473210036754608}, {'lang': 'sr', 'score': 0.057975586503744125}, {'lang': 'en', 'score': 0.05422825738787651}]

print(parse_sentence("你好世界!Hello, world!Привет, мир!"))
# [{'text': '你好世界!Hello, world!', 'lang': 'ZH', 'length': 18}, {'text': 'Привет, мир!', 'lang': 'UK', 'length': 12}, {'text': '', 'lang': 'EN', 'length': 0}]

Accuracy

References to the benchmark

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast_langdetect-0.1.0.tar.gz (5.5 kB view hashes)

Uploaded Source

Built Distribution

fast_langdetect-0.1.0-py3-none-any.whl (7.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page