Skip to main content

A holistic self-supervised data cleaning strategy to detect irrelevant samples, near duplicates and label errors.

Project description

SelfClean

SelfClean Paper | Data Cleaning Protocol Paper

SelfClean Teaser

PyPI version Contribotion

A holistic self-supervised data cleaning strategy to detect irrelevant samples, near duplicates, and label errors.

Development Environment

Run make for a list of possible targets.

Installation

Run these commands to install the project:

make init
make install

To run linters on all files:

pre-commit run --all-files

Code and test conventions

  • black for code style
  • isort for import sorting
  • pytest for running tests

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

selfclean-0.0.14.tar.gz (95.1 kB view hashes)

Uploaded Source

Built Distribution

selfclean-0.0.14-py3-none-any.whl (156.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page