Skip to main content

Quickly detect text language and segment language

Project description

fast-langdetect 🚀

PyPI version Downloads Downloads

Python 3.9-3.12 support only. 🐍

80x faster and 95% accurate language identification with Fasttext 🏎️

This library is a wrapper for the language detection model trained on fasttext by Facebook. For more information, please visit: https://fasttext.cc/docs/en/language-identification.html 📘

This repository is patched from zafercavdar/fasttext-langdetect, adding multi-language segmentation and better packaging support. 🌐

Facilitates more accurate TTS implementation. 🗣️

Need 200M+ memory to use low_memory mode 💾

Installation 💻

pip install fast-langdetect

Usage 🖥️

For more accurate language detection, please use detect(text,low_memory=False) to load the big model.

Model will be downloaded in /tmp/fasttext-langdetect directory when you first use it.

from fast_langdetect import detect_langs

print(detect_langs("Hello, world!"))
# EN

print(detect_langs("Привет, мир!"))
# RU


print(detect_langs("你好,世界!"))
# ZH

Advanced usage 🚀

from fast_langdetect import detect, parse_sentence, detect_multilingual

print(detect("Hello, world!"))
# {'lang': 'en', 'score': 0.1520957201719284}

print(detect_multilingual("Hello, world!你好世界!Привет, мир!"))
# [{'lang': 'ru', 'score': 0.39008623361587524}, {'lang': 'zh', 'score': 0.18235979974269867}, {'lang': 'ja', 'score': 0.08473210036754608}, {'lang': 'sr', 'score': 0.057975586503744125}, {'lang': 'en', 'score': 0.05422825738787651}]

print(parse_sentence("你好世界!Hello, world!Привет, мир!"))
# [{'text': '你好世界!Hello, world!', 'lang': 'ZH', 'length': 18}, {'text': 'Привет, мир!', 'lang': 'UK', 'length': 12}, {'text': '', 'lang': 'EN', 'length': 0}]

Accuracy 🎯

References to the benchmark

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast_langdetect-0.1.1.tar.gz (5.8 kB view hashes)

Uploaded Source

Built Distribution

fast_langdetect-0.1.1-py3-none-any.whl (8.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page