Detect language of text
Project description
language-detector
language-detector detects the language of text
Installation
pip install language-detector
Python Version
Works with both Python 2 and 3
Use
from language_detector import detect_language
text = "I arrived in that city on January 4, 1937"
language = detect_language(text)
# prints English
Features
Languages Supported |
---|
Arabic |
English |
Farsi |
French |
German |
Kurmanci (Kurdish) |
Mandarin |
Russian |
Sorani (Kurdish) |
Spanish |
Turkish |
Testing
To test the package run
python -m unittest language_detector.tests.test
Comparison
Test is a comparison of how well language-detector and langid identify languages in the data sources.
package | language-detector | langid |
---|---|---|
test-duration (in seconds) | 0.10 | 3.83 |
accuracy | 96.77% | 67.74% |
Excluding Languages
If you don't want language-detector to look for certain languages, you can monkey-patch the code. For example, in order to exclude English:
import language_detector
language_detector.char_language = [cl for cl in char_language if cl[1] != "English"]
# proceed as normal
Datasets
The following is a list of datasets used for each language:
Language | Datasets |
---|---|
Arabic | UN Corpora |
English | UN Corpora |
Farsi | BBC News Persian |
French | UN Corpora |
German | Deutsche Welle |
Kurmanci (Kurdish) | Rudaw |
Mandarin | UN Corpora |
Russian | UN Corpora |
Sorani (Kurdish) | Rudaw |
Spanish | UN Corpora |
Turkish | BBC News Türkçe |
Contributing
If you'd like to contribute a new language, please consult CONTRIBUTING.md
Support
Contact the package author, Daniel J. Dufour, at daniel.j.dufour@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.