Skip to main content

No project description provided

Project description

niacin

A Python library for replacing the missing variation in your text data.

PyPI version travis codecov readthedocs DOI

Why should I use this?

Data collected for model training necessarily undersamples the likely variance in the input space. This library is a collection of tools for inserting typical kinds of perturbations to better approximate population variance; and, for creating similar-but-incorrect examples to aid in reducing the total size of the hypothesis space. These are commonly known as ENRICHMENT and NEGATIVE SAMPLING, respectively.

How do I use this?

Functions in niacin are separated into submodules for specific data types. Functions expose a similar API, with two input arguments: the data to be transformed, and the probability of applying a specific transformation.

enrichment:

from niacin.text import en
data = "This is the song that never ends and it goes on and on my friends"
print(en.add_misspelling(data, p=1.0))
This is teh song tath never ends adn it goes on anbd on my firends

negative sampling:

from niacin.text import en
data = "This is the song that never ends and it goes on and on my friends"
print(en.add_hypernyms(data, p=1.0))
This is the musical composition that never extremity and it exit on and on my person

How do I install this?

with pip:

pip install niacin

from source:

git clone git@github.com:deniederhut/niacin.git && cd niacin && python setup.py install

If you have installed niacin from source, you can run the test suite to verify that everything is working properly. We use pytest, which you will first need to install:

pip install pytest

then you can run the library's tests with

pytest -m 'not slow'

if you would like to see the coverage report, you can do so with pytest-cov like so:

pip install pytest-cov
pytest -m 'not slow' --cov=niacin && coverage html

How can I install the optional dependencies?

If you want to use the backtranslate functionality, niacin will need pytorch and some other libraries. These can be installed as extras with:

pip install niacin[backtranslate]

If you are on macos, this might fail with a warning about your version of gcc:

Your compiler (g++) is not compatible with the compiler Pytorch was
built with for this platform, which is clang++ on darwin.

You can avoid this error by executing the following:

CFLAGS='-stdlib=libc++' pip install niacin[backtranslate]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

niacin-0.5.1.tar.gz (2.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

niacin-0.5.1-py3-none-any.whl (2.1 MB view details)

Uploaded Python 3

File details

Details for the file niacin-0.5.1.tar.gz.

File metadata

  • Download URL: niacin-0.5.1.tar.gz
  • Upload date:
  • Size: 2.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/24.0 requests/2.20.1 requests-toolbelt/0.8.0 urllib3/1.26.9 tqdm/4.28.1 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.4 CPython/3.6.7

File hashes

Hashes for niacin-0.5.1.tar.gz
Algorithm Hash digest
SHA256 29bed98056aee7e503e2ec1a75dfe921b71b3af441477e89145797de796f7a39
MD5 16e635a8aa614d3dd4d7cff18706419d
BLAKE2b-256 63149cfb47d4d6adeaac67e587d695b70d5a42e9f5dff2c6fe9d41eacb079edf

See more details on using hashes here.

File details

Details for the file niacin-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: niacin-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/24.0 requests/2.20.1 requests-toolbelt/0.8.0 urllib3/1.26.9 tqdm/4.28.1 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.4 CPython/3.6.7

File hashes

Hashes for niacin-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 760fd7b3aa73a3aca3f9e101fe065b60fd1c7504a85106e637f6334b9b9102e1
MD5 aa87650770d326bb2628b93c3bf20d7f
BLAKE2b-256 ac6d2668cc513f6368815948e2afeb23da0fe991ddb0391b202db280a2056e99

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page