lang-detect·PyPI

a tool to detecting the language for a small piece of unicode text without any dependency to other libraries.

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
License
- OSI Approved :: BSD License
Topic
- Utilities

Project description

# lang-detect: a tool to detect language

Detecting the language for a small piece of unicode text without any dependency to other libraries.

Currently we support detecting de, en, es, fr, it, ja, nl, pl, ru, zh-hans, zh-hant, and zh-yue.

After some simple testing, we found that the result for long sentence is better.

## Method

We focus on the Basic Multilingual Plane in unicode encoding, and current language support set could be extended.

For each language, we use a uniformed ngram vector to represent the language itself. This vector can be seen at the data folder.

When we detect a text, we generate the uniformed ngram vector for this text, and just comparing the cosine value of the angle between the text vector and the language vector.

To get the language vector, we use feature articles on Wikipedia as corpus.

## Usage

cd to the project root

bin/langdetect YOUR_SENTENCE_HERE

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
License
- OSI Approved :: BSD License
Topic
- Utilities

Release history Release notifications | RSS feed

This version

0.0.1

Aug 18, 2011

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

lang_detect-0.0.1-py2.6.egg (1.4 kB view details)

Uploaded Aug 18, 2011 Egg

File details

Details for the file lang_detect-0.0.1-py2.6.egg.

File metadata

Download URL: lang_detect-0.0.1-py2.6.egg
Upload date: Aug 18, 2011
Size: 1.4 kB
Tags: Egg
Uploaded using Trusted Publishing? No

File hashes

Hashes for lang_detect-0.0.1-py2.6.egg
Algorithm	Hash digest
SHA256	`c1fa4a594eab61f1d2cbf9fece10f91cd5b507a7155410764c8b579c4c6e8a09`
MD5	`f80c43a3beb93cf25d7acfdde1d95603`
BLAKE2b-256	`ce63a28dd6e7a709c6d758d70cc6690b834dc769300960affd1621cbacd170c8`

See more details on using hashes here.

lang-detect 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes