Detect language of text
Project description
language-detector
language-detector detects the language of text
Installation
pip install language-detector
Python Version
Works with both Python 2 and 3
Use
from language_detector import detect_language
text = "I arrived in that city on January 4, 1937"
language = detect_language(text)
# prints English
Features
Languages Supported |
---|
Arabic |
English |
Farsi |
French |
German |
Kurmanci (Kurdish) |
Mandarin |
Russian |
Sorani (Kurdish) |
Spanish |
Turkish |
Testing
To test the package run
python -m unittest language_detector.tests.test
Comparison
Test is a comparison of how well language-detector and langid identify languages in the data sources.
package | language-detector | langid |
---|---|---|
test-duration (in seconds) | 0.10 | 3.83 |
accuracy | 96.77% | 67.74% |
Excluding Languages
If you don't want language-detector to look for certain languages, you can monkey-patch the code. For example, in order to exclude English:
import language_detector
language_detector.char_language = [cl for cl in char_language if cl[1] != "English"]
# proceed as normal
Datasets
The following is a list of datasets used for each language:
Language | Datasets |
---|---|
Arabic | UN Corpora |
English | UN Corpora |
Farsi | BBC News Persian |
French | UN Corpora |
German | Deutsche Welle |
Kurmanci (Kurdish) | Rudaw |
Mandarin | UN Corpora |
Russian | UN Corpora |
Sorani (Kurdish) | Rudaw |
Spanish | UN Corpora |
Turkish | BBC News Türkçe |
Contributing
If you'd like to contribute a new language, please consult CONTRIBUTING.md
Support
Contact the package author, Daniel J. Dufour, at daniel.j.dufour@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file language-detector-5.0.2.tar.gz
.
File metadata
- Download URL: language-detector-5.0.2.tar.gz
- Upload date:
- Size: 6.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/44.1.0 requests-toolbelt/0.9.1 tqdm/4.44.0 CPython/2.7.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ab647575d0f662db1e998ca5a306486e04de2250ec8c942709acec520f7d8d3 |
|
MD5 | 5c316646e18b807ef1f5f5eca13f9816 |
|
BLAKE2b-256 | 5c0afefb61145a386968d2070323b608be6a1eb7508e610c2319daad746c2c33 |