Skip to main content

Badgers: bad data generators

Project description

Badgers: bad data generators

badgers is a python library for generating bad data (more precisely to augment existing data with data quality deficits such as outliers, missing values, noise, etc.). It is based upon a simple API and provides a set of generators object that can generate data quality deficits from existing data.

A word of caution: badgers is still in an early development stage. Although the core structure of the package and the generate(X,y) signature are not expected to change, some API details (like attributes names) are likely to change.

The full documentation is hosted here: https://fraunhofer-iese.github.io/badgers/.

For a quick-start, you can install badgers with pip:

pip install badgers

Import badgers as any other library and start using it:

from sklearn.datasets import make_blobs
from badgers.generators.tabular_data.noise import GlobalGaussianNoiseGenerator

X, y = make_blobs()
trf = GlobalGaussianNoiseGenerator(noise_std=0.5)
Xt, yt = trf.generate(X,y)

More examples are available in the tutorials section.

The API documentation is also available in the API section.

Interested developers will find relevant information in the CONTRIBUTING.md page.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

badgers-0.0.5.tar.gz (1.3 MB view hashes)

Uploaded Source

Built Distribution

badgers-0.0.5-py3-none-any.whl (25.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page