Sensing the language of the text using Machine Learning
Project description
Luga
- A blazing fast language detection using fastText's language models
Luga is a Swahili word for language. fastText provides blazing-fast language detection tool. Lamentably, fastText's API is beauty-less and the documentation is a bit fuzzy. It is also funky that we have to manually download and load models.
Here is where luga comes in. We abstract unnecessary steps and allow you to do precisely one thing: detecting text language.
Installation
python -m pip install -U luga
Usage:
⚠️ Note: The first usage downloads the model for you. It will take a bit longer to import depending on internet speed. It is done only once.
from luga import language
print(language("the world has ended yesterday"))
# Language(name='en', score=0.9804665446281433)
Without Luga:
Download the model
wget https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin -O /tmp/lid.176.bin
Load and use
import fasttext
PATH_TO_MODEL = '/tmp/lid.176.bin'
fmodel = fasttext.load_model(PATH_TO_MODEL)
fmodel.predict(["the world has ended yesterday"])
# ([['__label__en']], [array([0.98046654], dtype=float32)])
Comming soon ...
Dev:
poetry run pre-commit install
Release Flow
# assumes git push is completed
git tag -l # lists tags
git tag v*.*.* # Major.Minor.Fix
git push origin tag v*.*.*
# to delete tag:
git tag -d v*.*.* && git push origin tag -d v*.*.*
TODO:
- refactor artifacts.py
- auto checkers with pre-commit | invoke
- write more tests
- write github actions
- create a smart data checker (a fast List[str], what do with none strings)
- make it faster with Cython
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
luga-0.2.0.tar.gz
(4.0 kB
view hashes)
Built Distribution
luga-0.2.0-py3-none-any.whl
(4.2 kB
view hashes)