Skip to main content

Fast contamination detection for ML training data - Python bindings for decon

Project description

decontaminate

Fast contamination detection for ML training data. Python bindings for decon.

Installation

pip install decontaminate

Usage

import decon

config = decon.Config(
    training_dir="/path/to/training/data",
    evals_dir="/path/to/eval/references",
    report_output_dir="/path/to/output",
)
report_dir = decon.detect(config)

API

The Python API is a thin PyO3 wrapper over the Rust implementation. See src/lib.rs for all Config parameters and available functions:

  • detect(), review(), compare(), evals(), server()
  • Tokenizer (encode/decode with cl100k, o200k, etc.)
  • clean_text() (text normalization)

Documentation

Full documentation: https://github.com/vincentzed/decon

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

decontaminate-0.3.0.post4.tar.gz (132.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

decontaminate-0.3.0.post4-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.9 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ x86-64

decontaminate-0.3.0.post4-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.8 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ ARM64

decontaminate-0.3.0.post4-cp314-cp314-macosx_11_0_arm64.whl (5.5 MB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

decontaminate-0.3.0.post4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.0 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

decontaminate-0.3.0.post4-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ ARM64

decontaminate-0.3.0.post4-cp313-cp313-macosx_11_0_arm64.whl (5.5 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

decontaminate-0.3.0.post4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.0 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

decontaminate-0.3.0.post4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

decontaminate-0.3.0.post4-cp312-cp312-macosx_11_0_arm64.whl (5.5 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file decontaminate-0.3.0.post4.tar.gz.

File metadata

  • Download URL: decontaminate-0.3.0.post4.tar.gz
  • Upload date:
  • Size: 132.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for decontaminate-0.3.0.post4.tar.gz
Algorithm Hash digest
SHA256 881e1ad05cd493b8f68665479d126cb9fec728e71bd26c105bdb8ba8142127e4
MD5 1a373a9f094242abaf7c30e58e46418a
BLAKE2b-256 0c48ad970b46cf4f7e738753f3c8a998c7e512b0e16f44123ef4f2cbfd041138

See more details on using hashes here.

Provenance

The following attestation bundles were made for decontaminate-0.3.0.post4.tar.gz:

Publisher: release.yml on vincentzed/decon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file decontaminate-0.3.0.post4-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for decontaminate-0.3.0.post4-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 06435c6e1ff9333c1e4d49de9d30c0f9e5586ed95a6b03a68453c9af92fe290d
MD5 d66787fd1a815ea589e8a233a1ccfd45
BLAKE2b-256 28a927926e37f1bf70f60379c868adc3bba5953148f2d4ebf495fdfe66bfcb7f

See more details on using hashes here.

Provenance

The following attestation bundles were made for decontaminate-0.3.0.post4-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on vincentzed/decon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file decontaminate-0.3.0.post4-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for decontaminate-0.3.0.post4-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 b65d36c2e94528ef2955c1af9015d2c206a29c914125473c2a0c995438d65eab
MD5 e6ba675f1933131237a723afcc89191c
BLAKE2b-256 f8779e86aa4072b5c20c804d7cf3d217cf16597fec5decb84a93c83b1f271559

See more details on using hashes here.

Provenance

The following attestation bundles were made for decontaminate-0.3.0.post4-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on vincentzed/decon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file decontaminate-0.3.0.post4-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for decontaminate-0.3.0.post4-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 679c25d5b3aa34047c29df5fc050483b4d8fc48f1390538dbacdba0a1393e6e2
MD5 d7b0d1ff54ea86178b0b3b745f699e6a
BLAKE2b-256 be35e744abd9c80a961fcdfca921378125516b3f286470e4559051f8e6fd86ca

See more details on using hashes here.

Provenance

The following attestation bundles were made for decontaminate-0.3.0.post4-cp314-cp314-macosx_11_0_arm64.whl:

Publisher: release.yml on vincentzed/decon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file decontaminate-0.3.0.post4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for decontaminate-0.3.0.post4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 893066a1a89f5e91469740271215ca1d5d1d014ad21d13a3ac705db0f10ce6d9
MD5 9f61296ef63a2097bbcdc44407e563e7
BLAKE2b-256 4b4d97c6d87625595f3ece6ef0b0ab25584998eb50c092e02bc324e3dbe8a4ba

See more details on using hashes here.

Provenance

The following attestation bundles were made for decontaminate-0.3.0.post4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on vincentzed/decon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file decontaminate-0.3.0.post4-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for decontaminate-0.3.0.post4-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 5163d5b8d96f3413100a3cdc2a8f9ac15f74ddfd33bb12979a8b60a087195981
MD5 c26646b568bf9df865aaa89ef791e520
BLAKE2b-256 912b99cde38d9a7fcd77d2675381ccf0639ba588cc7e3bcddaae27e669ccb184

See more details on using hashes here.

Provenance

The following attestation bundles were made for decontaminate-0.3.0.post4-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on vincentzed/decon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file decontaminate-0.3.0.post4-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for decontaminate-0.3.0.post4-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4e135fd7d1683a22e63dc4969cb0a808a44e4dd9e3abf807e0571bc77ec1d9fd
MD5 af40f530ecfb53d844337c41d68dfe0b
BLAKE2b-256 19a90cca8c9504db8b4284e05572ffecc12f26cb9afdc86debde08a98256139f

See more details on using hashes here.

Provenance

The following attestation bundles were made for decontaminate-0.3.0.post4-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on vincentzed/decon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file decontaminate-0.3.0.post4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for decontaminate-0.3.0.post4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ef1e93525ecca844479fa701619cf4c69c77485abb79223339c7c6675f496507
MD5 fa32ac10fa748a20b09f186458759d92
BLAKE2b-256 44572f39b5b06af3f558a3bd7ed61e361f59ca2eec3cde061cc5b99f03b5acca

See more details on using hashes here.

Provenance

The following attestation bundles were made for decontaminate-0.3.0.post4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on vincentzed/decon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file decontaminate-0.3.0.post4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for decontaminate-0.3.0.post4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 b1f2bf73285b679d01c6c81769a078396eab1bb767bb3bf1e0ab7a3cea82c3af
MD5 664d08cc34a7bdb1b75ec0a58fd5f7b6
BLAKE2b-256 55390e7fccef5e519f472646d1ddf05fa3cacfdb0ebd271be54cc01462c9a59c

See more details on using hashes here.

Provenance

The following attestation bundles were made for decontaminate-0.3.0.post4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on vincentzed/decon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file decontaminate-0.3.0.post4-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for decontaminate-0.3.0.post4-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5b399d072e2c08125cdc59fde8957cb5d94f9c717c7a595684374073d5208444
MD5 b11abf6620c547c54e96b70a3f56a983
BLAKE2b-256 b8206eb3297e667a2c2d1e95d3b5b8286ec738a6c6c8b7fc8b41128d227ffadd

See more details on using hashes here.

Provenance

The following attestation bundles were made for decontaminate-0.3.0.post4-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on vincentzed/decon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page