No project description provided
A Python library for replacing the missing variation in your text data.
Why should I use this?
Data collected for model training necessarily undersamples the likely variance in the input space. This library is a collection of tools for inserting typical kinds of perturbations to better approximate population variance; and, for creating similar-but-incorrect examples to aid in reducing the total size of the hypothesis space. These are commonly known as ENRICHMENT and NEGATIVE SAMPLING, respectively.
How do I use this?
Functions in niacin are separated into submodules for specific data types. Functions expose a similar API, with two input arguments: the data to be transformed, and the probability of applying a specific transformation.
from niacin.text import en data = "This is the song that never ends and it goes on and on my friends" print(en.add_misspelling(data, p=1.0))
This is teh song tath never ends adn it goes on anbd on my firends
from niacin.text import en data = "This is the song that never ends and it goes on and on my friends" print(en.add_hypernyms(data, p=1.0))
This is the musical composition that never extremity and it exit on and on my person
How do I install this?
pip install niacin
git clone firstname.lastname@example.org:deniederhut/niacin.git && cd niacin && python setup.py install
If you have installed
niacin from source, you can run the test suite to verify that
everything is working properly. We use
which you will first need to install:
pip install pytest
then you can run the library's tests with
pytest -m 'not slow'
if you would like to see the coverage report, you can do so with
pip install pytest-cov pytest -m 'not slow' --cov=niacin && coverage html
How can I install the optional dependencies?
If you want to use the backtranslate functionality, niacin will need pytorch and some other libraries. These can be installed as extras with:
pip install niacin[backtranslate]
If you are on macos, this might fail with a warning about your version of gcc:
Your compiler (g++) is not compatible with the compiler Pytorch was built with for this platform, which is clang++ on darwin.
You can avoid this error by executing the following:
CFLAGS='-stdlib=libc++' pip install niacin[backtranslate]
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.