Skip to main content

Detect languages via a fasttext model

Project description

fastlid

[tests]pythonCode style: blackLicense: MITPyPI version

Language identification based on fasttext (lid.176.ftz https://fasttext.cc/docs/en/language-identification.html).

Python3.8, 3.9 only -- there seem to be some problems with python

The lid.176.ftz file is licensed under Creative Commons Attribution-Share-Alike License 3.0 and is not part of this module. It is automatically downloaded from its external origin on the first run of this module.

This module attempts to immitate the follow two features of langid

  • langid.classify: fastlid
  • langid.set_languages(langs=[...]): fastlid.set_languages = [...]
    • import fastlid
    • fastlid.set_languages = ['nl','fr'])
  • TODO: Commandline interface

Install it

pip install fastlid

or install from git

pip install git+https://github.com/ffreemt/fast-langid.git

# also works pip install git+https://github.com/ffreemt/fast-langid

or clone the git repo and install from source.

Use it

from fastlid import fastlid, supported_langs

# support 176 languages
print(supported_langs, len(supported_langs))
# ['af', 'als', 'am', 'an', 'ar', 'arz', 'as', 'ast', 'av', 'az'] 176

fastlid("test this")
# ('en', 0.765)

fastlid("test this 测试一下", k=2)
# (['zh', 'en'], [0.663, 0.124])

fastlid.set_languages = ['fr', 'zh']
fastlid("test this 测试吧")
# ('zh', 0.01)

fastlid.set_languages = None
fastlid("test this 测试吧")
('en', 0.686)

fastlid.set_languages = ['fr', 'zh', 'en']
fastlid("test this 测试吧", k=3)
(['en', 'zh', 'fr'], [0.686, 0.01, 0.006])

N.B. hanzidentifier can be used to identify simplified Chinese or/and traditional Chinese should you need to do so.

For Developers

Install poetry and yarn the way you like it.

poetry install  # install python packages
yarn install --dev  # install necesary node packages

# ...code...
yarn test
yarn final

# ...optionally submit pr...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastlid-0.1.12.tar.gz (6.0 kB view details)

Uploaded Source

Built Distribution

fastlid-0.1.12-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file fastlid-0.1.12.tar.gz.

File metadata

  • Download URL: fastlid-0.1.12.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.8.10 Windows/10

File hashes

Hashes for fastlid-0.1.12.tar.gz
Algorithm Hash digest
SHA256 02b4c412e4d5786b7e7a40a78a82fb641fb0d4ace60d7c6496e8a66c32d1be4c
MD5 42abb6d1317a72e015e999936f4967b2
BLAKE2b-256 777427edc69d74de1139a1dc0b17102fb235be28567e9c060822883d20e818ee

See more details on using hashes here.

File details

Details for the file fastlid-0.1.12-py3-none-any.whl.

File metadata

  • Download URL: fastlid-0.1.12-py3-none-any.whl
  • Upload date:
  • Size: 6.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.8.10 Windows/10

File hashes

Hashes for fastlid-0.1.12-py3-none-any.whl
Algorithm Hash digest
SHA256 5b0a4bb6bccad720e257c0a290b651fec3da03c1fb0d9131813d85e72a8746f4
MD5 989f7a5a97e966b5e8aa606f6b926dcb
BLAKE2b-256 992c2b90912e838e38b324ae096d65b59e976f3099e05e83bb21b93599bc7a38

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page