Detect language of text
Project description
language-detector
language-detector detects the language of text
Installation
pip install language-detector
Python Version
Works with both Python 2 and 3
Use
from language_detector import detect_language
text = "I arrived in that city on January 4, 1937"
language = detect_language(text)
# prints English
Features
| Languages Supported |
|---|
| Arabic |
| English |
| Farsi |
| French |
| German |
| Kurmanci (Kurdish) |
| Mandarin |
| Russian |
| Sorani (Kurdish) |
| Spanish |
| Turkish |
Testing
To test the package run
python -m unittest language_detector.tests.test
Comparison
Test is a comparison of how well language-detector and langid identify languages in the data sources.
| package | language-detector | langid |
|---|---|---|
| test-duration (in seconds) | 0.10 | 3.83 |
| accuracy | 96.77% | 67.74% |
Excluding Languages
If you don't want language-detector to look for certain languages, you can monkey-patch the code. For example, in order to exclude English:
import language_detector
language_detector.char_language = [cl for cl in char_language if cl[1] != "English"]
# proceed as normal
Datasets
The following is a list of datasets used for each language:
| Language | Datasets |
|---|---|
| Arabic | UN Corpora |
| English | UN Corpora |
| Farsi | BBC News Persian |
| French | UN Corpora |
| German | Deutsche Welle |
| Kurmanci (Kurdish) | Rudaw |
| Mandarin | UN Corpora |
| Russian | UN Corpora |
| Sorani (Kurdish) | Rudaw |
| Spanish | UN Corpora |
| Turkish | BBC News Türkçe |
Contributing
If you'd like to contribute a new language, please consult CONTRIBUTING.md
Support
Contact the package author, Daniel J. Dufour, at daniel.j.dufour@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.