Skip to main content

Detect language of text

Project description

Build Status

language-detector

language-detector detects the language of text

Installation

pip install language-detector

Python Version

Works with both Python 2 and 3

Use

from language_detector import detect_language
text = "I arrived in that city on January 4, 1937"
language = detect_language(text)
# prints English

Features

Languages Supported
Arabic
English
Farsi
French
German
Kurmanci (Kurdish)
Mandarin
Russian
Sorani (Kurdish)
Spanish
Turkish

Testing

To test the package run

python -m unittest language_detector.tests.test

Comparison

Test is a comparison of how well language-detector and langid identify languages in the data sources.

package language-detector langid
test-duration (in seconds) 0.10 3.83
accuracy 96.77% 67.74%

Excluding Languages

If you don't want language-detector to look for certain languages, you can monkey-patch the code. For example, in order to exclude English:

import language_detector
language_detector.char_language = [cl for cl in char_language if cl[1] != "English"]

# proceed as normal

Datasets

The following is a list of datasets used for each language:

Language Datasets
Arabic UN Corpora
English UN Corpora
Farsi BBC News Persian
French UN Corpora
German Deutsche Welle
Kurmanci (Kurdish) Rudaw
Mandarin UN Corpora
Russian UN Corpora
Sorani (Kurdish) Rudaw
Spanish UN Corpora
Turkish BBC News Türkçe

Contributing

If you'd like to contribute a new language, please consult CONTRIBUTING.md

Support

Contact the package author, Daniel J. Dufour, at daniel.j.dufour@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

language-detector-5.0.2.tar.gz (6.6 kB view details)

Uploaded Source

File details

Details for the file language-detector-5.0.2.tar.gz.

File metadata

  • Download URL: language-detector-5.0.2.tar.gz
  • Upload date:
  • Size: 6.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/44.1.0 requests-toolbelt/0.9.1 tqdm/4.44.0 CPython/2.7.17

File hashes

Hashes for language-detector-5.0.2.tar.gz
Algorithm Hash digest
SHA256 7ab647575d0f662db1e998ca5a306486e04de2250ec8c942709acec520f7d8d3
MD5 5c316646e18b807ef1f5f5eca13f9816
BLAKE2b-256 5c0afefb61145a386968d2070323b608be6a1eb7508e610c2319daad746c2c33

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page