Fast contamination detection for ML training data - Python bindings for decon
Project description
decontaminate
Fast contamination detection for ML training data. Python bindings for decon.
Installation
pip install decontaminate
Usage
import decon
config = decon.Config(
training_dir="/path/to/training/data",
evals_dir="/path/to/eval/references",
report_output_dir="/path/to/output",
)
report_dir = decon.detect(config)
API
The Python API is a thin PyO3 wrapper over the Rust implementation. See src/lib.rs for all Config parameters and available functions:
detect(),review(),compare(),evals(),server()Tokenizer(encode/decode with cl100k, o200k, etc.)clean_text()(text normalization)
Documentation
Full documentation: https://github.com/vincentzed/decon
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file decontaminate-0.3.0.post4.tar.gz.
File metadata
- Download URL: decontaminate-0.3.0.post4.tar.gz
- Upload date:
- Size: 132.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
881e1ad05cd493b8f68665479d126cb9fec728e71bd26c105bdb8ba8142127e4
|
|
| MD5 |
1a373a9f094242abaf7c30e58e46418a
|
|
| BLAKE2b-256 |
0c48ad970b46cf4f7e738753f3c8a998c7e512b0e16f44123ef4f2cbfd041138
|
Provenance
The following attestation bundles were made for decontaminate-0.3.0.post4.tar.gz:
Publisher:
release.yml on vincentzed/decon
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
decontaminate-0.3.0.post4.tar.gz -
Subject digest:
881e1ad05cd493b8f68665479d126cb9fec728e71bd26c105bdb8ba8142127e4 - Sigstore transparency entry: 811735118
- Sigstore integration time:
-
Permalink:
vincentzed/decon@213a80c10af980f4eb8d2de7223618506ee80237 -
Branch / Tag:
refs/tags/v0.3.0.post4 - Owner: https://github.com/vincentzed
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@213a80c10af980f4eb8d2de7223618506ee80237 -
Trigger Event:
push
-
Statement type:
File details
Details for the file decontaminate-0.3.0.post4-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: decontaminate-0.3.0.post4-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 5.9 MB
- Tags: CPython 3.14, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
06435c6e1ff9333c1e4d49de9d30c0f9e5586ed95a6b03a68453c9af92fe290d
|
|
| MD5 |
d66787fd1a815ea589e8a233a1ccfd45
|
|
| BLAKE2b-256 |
28a927926e37f1bf70f60379c868adc3bba5953148f2d4ebf495fdfe66bfcb7f
|
Provenance
The following attestation bundles were made for decontaminate-0.3.0.post4-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
release.yml on vincentzed/decon
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
decontaminate-0.3.0.post4-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
06435c6e1ff9333c1e4d49de9d30c0f9e5586ed95a6b03a68453c9af92fe290d - Sigstore transparency entry: 811735276
- Sigstore integration time:
-
Permalink:
vincentzed/decon@213a80c10af980f4eb8d2de7223618506ee80237 -
Branch / Tag:
refs/tags/v0.3.0.post4 - Owner: https://github.com/vincentzed
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@213a80c10af980f4eb8d2de7223618506ee80237 -
Trigger Event:
push
-
Statement type:
File details
Details for the file decontaminate-0.3.0.post4-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: decontaminate-0.3.0.post4-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 5.8 MB
- Tags: CPython 3.14, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b65d36c2e94528ef2955c1af9015d2c206a29c914125473c2a0c995438d65eab
|
|
| MD5 |
e6ba675f1933131237a723afcc89191c
|
|
| BLAKE2b-256 |
f8779e86aa4072b5c20c804d7cf3d217cf16597fec5decb84a93c83b1f271559
|
Provenance
The following attestation bundles were made for decontaminate-0.3.0.post4-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
release.yml on vincentzed/decon
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
decontaminate-0.3.0.post4-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -
Subject digest:
b65d36c2e94528ef2955c1af9015d2c206a29c914125473c2a0c995438d65eab - Sigstore transparency entry: 811735344
- Sigstore integration time:
-
Permalink:
vincentzed/decon@213a80c10af980f4eb8d2de7223618506ee80237 -
Branch / Tag:
refs/tags/v0.3.0.post4 - Owner: https://github.com/vincentzed
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@213a80c10af980f4eb8d2de7223618506ee80237 -
Trigger Event:
push
-
Statement type:
File details
Details for the file decontaminate-0.3.0.post4-cp314-cp314-macosx_11_0_arm64.whl.
File metadata
- Download URL: decontaminate-0.3.0.post4-cp314-cp314-macosx_11_0_arm64.whl
- Upload date:
- Size: 5.5 MB
- Tags: CPython 3.14, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
679c25d5b3aa34047c29df5fc050483b4d8fc48f1390538dbacdba0a1393e6e2
|
|
| MD5 |
d7b0d1ff54ea86178b0b3b745f699e6a
|
|
| BLAKE2b-256 |
be35e744abd9c80a961fcdfca921378125516b3f286470e4559051f8e6fd86ca
|
Provenance
The following attestation bundles were made for decontaminate-0.3.0.post4-cp314-cp314-macosx_11_0_arm64.whl:
Publisher:
release.yml on vincentzed/decon
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
decontaminate-0.3.0.post4-cp314-cp314-macosx_11_0_arm64.whl -
Subject digest:
679c25d5b3aa34047c29df5fc050483b4d8fc48f1390538dbacdba0a1393e6e2 - Sigstore transparency entry: 811735181
- Sigstore integration time:
-
Permalink:
vincentzed/decon@213a80c10af980f4eb8d2de7223618506ee80237 -
Branch / Tag:
refs/tags/v0.3.0.post4 - Owner: https://github.com/vincentzed
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@213a80c10af980f4eb8d2de7223618506ee80237 -
Trigger Event:
push
-
Statement type:
File details
Details for the file decontaminate-0.3.0.post4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: decontaminate-0.3.0.post4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 6.0 MB
- Tags: CPython 3.13, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
893066a1a89f5e91469740271215ca1d5d1d014ad21d13a3ac705db0f10ce6d9
|
|
| MD5 |
9f61296ef63a2097bbcdc44407e563e7
|
|
| BLAKE2b-256 |
4b4d97c6d87625595f3ece6ef0b0ab25584998eb50c092e02bc324e3dbe8a4ba
|
Provenance
The following attestation bundles were made for decontaminate-0.3.0.post4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
release.yml on vincentzed/decon
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
decontaminate-0.3.0.post4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
893066a1a89f5e91469740271215ca1d5d1d014ad21d13a3ac705db0f10ce6d9 - Sigstore transparency entry: 811735380
- Sigstore integration time:
-
Permalink:
vincentzed/decon@213a80c10af980f4eb8d2de7223618506ee80237 -
Branch / Tag:
refs/tags/v0.3.0.post4 - Owner: https://github.com/vincentzed
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@213a80c10af980f4eb8d2de7223618506ee80237 -
Trigger Event:
push
-
Statement type:
File details
Details for the file decontaminate-0.3.0.post4-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: decontaminate-0.3.0.post4-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 5.8 MB
- Tags: CPython 3.13, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5163d5b8d96f3413100a3cdc2a8f9ac15f74ddfd33bb12979a8b60a087195981
|
|
| MD5 |
c26646b568bf9df865aaa89ef791e520
|
|
| BLAKE2b-256 |
912b99cde38d9a7fcd77d2675381ccf0639ba588cc7e3bcddaae27e669ccb184
|
Provenance
The following attestation bundles were made for decontaminate-0.3.0.post4-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
release.yml on vincentzed/decon
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
decontaminate-0.3.0.post4-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -
Subject digest:
5163d5b8d96f3413100a3cdc2a8f9ac15f74ddfd33bb12979a8b60a087195981 - Sigstore transparency entry: 811735245
- Sigstore integration time:
-
Permalink:
vincentzed/decon@213a80c10af980f4eb8d2de7223618506ee80237 -
Branch / Tag:
refs/tags/v0.3.0.post4 - Owner: https://github.com/vincentzed
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@213a80c10af980f4eb8d2de7223618506ee80237 -
Trigger Event:
push
-
Statement type:
File details
Details for the file decontaminate-0.3.0.post4-cp313-cp313-macosx_11_0_arm64.whl.
File metadata
- Download URL: decontaminate-0.3.0.post4-cp313-cp313-macosx_11_0_arm64.whl
- Upload date:
- Size: 5.5 MB
- Tags: CPython 3.13, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e135fd7d1683a22e63dc4969cb0a808a44e4dd9e3abf807e0571bc77ec1d9fd
|
|
| MD5 |
af40f530ecfb53d844337c41d68dfe0b
|
|
| BLAKE2b-256 |
19a90cca8c9504db8b4284e05572ffecc12f26cb9afdc86debde08a98256139f
|
Provenance
The following attestation bundles were made for decontaminate-0.3.0.post4-cp313-cp313-macosx_11_0_arm64.whl:
Publisher:
release.yml on vincentzed/decon
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
decontaminate-0.3.0.post4-cp313-cp313-macosx_11_0_arm64.whl -
Subject digest:
4e135fd7d1683a22e63dc4969cb0a808a44e4dd9e3abf807e0571bc77ec1d9fd - Sigstore transparency entry: 811735310
- Sigstore integration time:
-
Permalink:
vincentzed/decon@213a80c10af980f4eb8d2de7223618506ee80237 -
Branch / Tag:
refs/tags/v0.3.0.post4 - Owner: https://github.com/vincentzed
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@213a80c10af980f4eb8d2de7223618506ee80237 -
Trigger Event:
push
-
Statement type:
File details
Details for the file decontaminate-0.3.0.post4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: decontaminate-0.3.0.post4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 6.0 MB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ef1e93525ecca844479fa701619cf4c69c77485abb79223339c7c6675f496507
|
|
| MD5 |
fa32ac10fa748a20b09f186458759d92
|
|
| BLAKE2b-256 |
44572f39b5b06af3f558a3bd7ed61e361f59ca2eec3cde061cc5b99f03b5acca
|
Provenance
The following attestation bundles were made for decontaminate-0.3.0.post4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
release.yml on vincentzed/decon
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
decontaminate-0.3.0.post4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
ef1e93525ecca844479fa701619cf4c69c77485abb79223339c7c6675f496507 - Sigstore transparency entry: 811735208
- Sigstore integration time:
-
Permalink:
vincentzed/decon@213a80c10af980f4eb8d2de7223618506ee80237 -
Branch / Tag:
refs/tags/v0.3.0.post4 - Owner: https://github.com/vincentzed
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@213a80c10af980f4eb8d2de7223618506ee80237 -
Trigger Event:
push
-
Statement type:
File details
Details for the file decontaminate-0.3.0.post4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: decontaminate-0.3.0.post4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 5.8 MB
- Tags: CPython 3.12, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b1f2bf73285b679d01c6c81769a078396eab1bb767bb3bf1e0ab7a3cea82c3af
|
|
| MD5 |
664d08cc34a7bdb1b75ec0a58fd5f7b6
|
|
| BLAKE2b-256 |
55390e7fccef5e519f472646d1ddf05fa3cacfdb0ebd271be54cc01462c9a59c
|
Provenance
The following attestation bundles were made for decontaminate-0.3.0.post4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
release.yml on vincentzed/decon
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
decontaminate-0.3.0.post4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -
Subject digest:
b1f2bf73285b679d01c6c81769a078396eab1bb767bb3bf1e0ab7a3cea82c3af - Sigstore transparency entry: 811735414
- Sigstore integration time:
-
Permalink:
vincentzed/decon@213a80c10af980f4eb8d2de7223618506ee80237 -
Branch / Tag:
refs/tags/v0.3.0.post4 - Owner: https://github.com/vincentzed
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@213a80c10af980f4eb8d2de7223618506ee80237 -
Trigger Event:
push
-
Statement type:
File details
Details for the file decontaminate-0.3.0.post4-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: decontaminate-0.3.0.post4-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 5.5 MB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5b399d072e2c08125cdc59fde8957cb5d94f9c717c7a595684374073d5208444
|
|
| MD5 |
b11abf6620c547c54e96b70a3f56a983
|
|
| BLAKE2b-256 |
b8206eb3297e667a2c2d1e95d3b5b8286ec738a6c6c8b7fc8b41128d227ffadd
|
Provenance
The following attestation bundles were made for decontaminate-0.3.0.post4-cp312-cp312-macosx_11_0_arm64.whl:
Publisher:
release.yml on vincentzed/decon
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
decontaminate-0.3.0.post4-cp312-cp312-macosx_11_0_arm64.whl -
Subject digest:
5b399d072e2c08125cdc59fde8957cb5d94f9c717c7a595684374073d5208444 - Sigstore transparency entry: 811735150
- Sigstore integration time:
-
Permalink:
vincentzed/decon@213a80c10af980f4eb8d2de7223618506ee80237 -
Branch / Tag:
refs/tags/v0.3.0.post4 - Owner: https://github.com/vincentzed
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@213a80c10af980f4eb8d2de7223618506ee80237 -
Trigger Event:
push
-
Statement type: