A holistic self-supervised data cleaning strategy to detect irrelevant samples, near duplicates and label errors.
Project description
SelfClean
SelfClean Paper | Data Cleaning Protocol Paper
A holistic self-supervised data cleaning strategy to detect irrelevant samples, near duplicates, and label errors.
Development Environment
Run make
for a list of possible targets.
Installation
Run these commands to install the project:
make init
make install
To run linters on all files:
pre-commit run --all-files
Code and test conventions
black
for code styleisort
for import sortingpytest
for running tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
selfclean-0.0.8.tar.gz
(94.4 kB
view hashes)
Built Distribution
selfclean-0.0.8-py3-none-any.whl
(155.6 kB
view hashes)
Close
Hashes for selfclean-0.0.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1c3cf12c47ed776a9349bfd64f6c8f300dd81e097c7d1fd0202238c0a4fb0248 |
|
MD5 | 026e6cff23b6ac82c32b3c572bbb0fc9 |
|
BLAKE2b-256 | f6c2af3a350c2f3c54acc886e39102d1ce1e9ef8f2a4a3572fe9658b7b426a72 |