Skip to main content

Data Preparation Toolkit Transforms using Ray

Project description

DPK Python Transforms

installation

The transforms are delivered as a standard pyton library available on pypi and can be installed using pip install:

python -m pip install data-prep-toolkit-transforms[all] or python -m pip install data-prep-toolkit-transforms[ray, all] or python -m pip install data-prep-toolkit-transforms[language]

installing the python transforms will also install data-prep-toolkit

installing the ray transforms will also install data-prep-toolkit[ray]

Release notes:

1.1.1.dev1

Include all code transforms as extra [code]

1.1.1.dev0

Refactored code transforms (code_uality, code2parquet, header_cleanser, license select, proglang_select)
Added ml-filter and enrichment
renamed PDF2Parquet to Docling2Paruqet 

1.0.1.dev1

Added Gneissweb transforms
fdedup fix for windows

1.0.1.dev0

PR #979 (code_profiler)

1.0.0.a6

Added Profiler
Added Resize

1.0.0.a5

Added Pii Redactor
Relax fasttext requirement >= 0.9.2

1.0.0.a4

Added missing ray implementation for lang_id, doc_quality, tokenization and filter
Added ray notebooks for lang id, Doc Quality, tokenization, and Filter

1.0.0.a3

Added code_profiler

1.0.0.a2

Relax dependencies on pandas (use latest or whatever is installed by application) Relax dependencies on requests (use latest or whatever is installed by application)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file data_prep_toolkit_transforms-1.1.4.dev0-py3-none-any.whl.

File metadata

File hashes

Hashes for data_prep_toolkit_transforms-1.1.4.dev0-py3-none-any.whl
Algorithm Hash digest
SHA256 b7805fbbe6d6692bd4a60da5dd89b41aaa901d5d9678f568bd2c7b2decd5ce19
MD5 5960e1f284de499f85289b4cdf340c0c
BLAKE2b-256 29cf83a9fead7bd8467ce6e7b27f6e165bd0df1d8161677f605433615777d878

See more details on using hashes here.

File details

Details for the file data_prep_toolkit_transforms-1.1.4.dev0-3-py3-none-any.whl.

File metadata

File hashes

Hashes for data_prep_toolkit_transforms-1.1.4.dev0-3-py3-none-any.whl
Algorithm Hash digest
SHA256 557b276ccef583c42ac500f3d2b37efc044bf057a404449a5d7013a7d9791ddc
MD5 1f43b3624817290dc1d9814f65199c95
BLAKE2b-256 fde142326ec09fbc06ccf6db3d9404365a0321344c77613614d4cd7becfe9719

See more details on using hashes here.

File details

Details for the file data_prep_toolkit_transforms-1.1.4.dev0-2-py3-none-any.whl.

File metadata

File hashes

Hashes for data_prep_toolkit_transforms-1.1.4.dev0-2-py3-none-any.whl
Algorithm Hash digest
SHA256 b043de15fe8c819e90293603be9d6d0d1863d342214efc2327af4fcd4a3b6cc8
MD5 862141e80099b88714f9d28d063f5a74
BLAKE2b-256 066a70a65c08fc18bcf6a62fac318f4c9594f4415818229d5205159ee282797d

See more details on using hashes here.

File details

Details for the file data_prep_toolkit_transforms-1.1.4.dev0-1-py3-none-any.whl.

File metadata

File hashes

Hashes for data_prep_toolkit_transforms-1.1.4.dev0-1-py3-none-any.whl
Algorithm Hash digest
SHA256 e2fe4ab6777cd065c6ec78cc11cd8fd7a7d1416576a1002ba012bc9f2fb342bf
MD5 a674901ab39b50e98fb32581c6815e2f
BLAKE2b-256 bd9857caf8241a7aaad6e1de3a173efc60cfb2bfc99d00ba5e98658697f5b3ee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page