Skip to main content

Sensing the language of the text using Machine Learning

Project description

Luga

  • A blazing fast language detection using fastText's language models

Luga is a Swahili word for language. fastText provides blazing-fast language detection tool. Lamentably, fastText's API is beauty-less and the documentation is a bit fuzzy. It is also funky that we have to manually download and load models.

Here is where luga comes in. We abstract unnecessary steps and allow you to do precisely one thing: detecting text language.

Installation

python -m pip install -U luga

Usage:

⚠️ Note: The first usage downloads the model for you. It will take a bit longer to import depending on internet speed. It is done only once.

from luga import language

print(language("the world has ended yesterday"))

# Language(name='en', score=0.9804665446281433)

Without Luga:

Download the model

wget https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin -O /tmp/lid.176.bin

Load and use

import fasttext

PATH_TO_MODEL = '/tmp/lid.176.bin'
fmodel = fasttext.load_model(PATH_TO_MODEL)
fmodel.predict(["the world has ended yesterday"])

# ([['__label__en']], [array([0.98046654], dtype=float32)])

Comming soon ...

Dev:

poetry run pre-commit install

Release Flow

git tag -l: lists tags git tag v*.*.* git push origin tag v*.*.*

to delete tag:

git tag -d v*.*.* && git push origin tag -d v*.*.*

TODO:

  • refactor artifacts.py
  • auto checkers with pre-commit | invoke
  • write more tests
  • write github actions
  • create a smart data checker (a fast List[str], what do with none strings)
  • make it faster with Cython

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

luga-0.1.8.tar.gz (3.9 kB view hashes)

Uploaded Source

Built Distribution

luga-0.1.8-py3-none-any.whl (4.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page