Raise red flags on machine learning datasets.
Project description
redflag
Automatic safety net for machine learning datasets.
:warning: This project is very rough and does not do much yet. The API will very likely change without warning.
Installation
You can install this package with pip
:
pip install redflag
Experimental
Installing scikit-learn
allows you to access some extra options for outlier detection.
pip install redflag[sklearn]
Example
redflag
is currently just a collection of functions. Most of the useful ones take a single column of data (e.g. a 1D NumPy array) and run a single test. For example, we can do some outlier detection:
>>> import redflag as rf
>>> data = [-3, -2, -2, -1, 0, 0, 0, 1, 2, 2, 3]
>>> rf.has_outliers(data)
array([], dtype=int64)
>>> rf.has_outliers(3 * data + [100])
array([100])
See the notebook_Using_redflag.ipynb for several other examples.
Contributing
Please see CONTRIBUTING.md
.
Testing
You can run the tests (requires pytest
and pytest-cov
) with
python run_tests.py
Most of the tests are run with doctest
.
Building
This repo uses PEP 517-style packaging. Read more about this and about Python packaging in general.
Building the project requires build
, so first:
pip install build
Then to build redflag
locally:
python -m build
The builds both .tar.gz
and .whl
files, either of which you can install with pip
.
Continuous integration
This repo has two GitHub 'workflows' or 'actions':
- Push to
main
: Run all tests on all version of Python. This is the Build and test workflow. - Publish a new release: Build and upload to PyPI. This is the Publish to PyPI workflow. Publish using the GitHub interface, for example (read more
© 2021 Agile Scientific, openly licenced under Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.