Skip to main content

Data Preparation Toolkit Transforms using Ray

Project description

DPK Python Transforms

installation

The transforms are delivered as a standard pyton library available on pypi and can be installed using uv pip install: pip install uv

python -m uv pip install data-prep-toolkit-transforms[all] or python -m uv pip install data-prep-toolkit-transforms[ray, all] or python -m uv pip install data-prep-toolkit-transforms[language]

installing the python transforms will also install data-prep-toolkit

installing the ray transforms will also install data-prep-toolkit[ray]

Release notes:

1.1.1.dev1

Include all code transforms as extra [code]

1.1.1.dev0

Refactored code transforms (code_uality, code2parquet, header_cleanser, license select, proglang_select)
Added ml-filter and enrichment
renamed PDF2Parquet to Docling2Paruqet 

1.0.1.dev1

Added Gneissweb transforms
fdedup fix for windows

1.0.1.dev0

PR #979 (code_profiler)

1.0.0.a6

Added Profiler
Added Resize

1.0.0.a5

Added Pii Redactor
Relax fasttext requirement >= 0.9.2

1.0.0.a4

Added missing ray implementation for lang_id, doc_quality, tokenization and filter
Added ray notebooks for lang id, Doc Quality, tokenization, and Filter

1.0.0.a3

Added code_profiler

1.0.0.a2

Relax dependencies on pandas (use latest or whatever is installed by application) Relax dependencies on requests (use latest or whatever is installed by application)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_prep_toolkit_transforms-1.1.8-py3-none-any.whl (9.1 MB view details)

Uploaded Python 3

File details

Details for the file data_prep_toolkit_transforms-1.1.8-py3-none-any.whl.

File metadata

File hashes

Hashes for data_prep_toolkit_transforms-1.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 58a009d2197dffc915e022c0b27cf3614481f57dc1043c0045711886195f89e7
MD5 03193486cedbb10207ac21ff2f6d7ea6
BLAKE2b-256 fcaf27dd9f868ba11083d65d7bf588465830145eaa4a543a99c73058db9f4356

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page