Skip to main content

Data Preparation Toolkit Transforms using Ray

Project description

DPK Python Transforms

installation

The transforms are delivered as a standard pyton library available on pypi and can be installed using pip install:

python -m pip install data-prep-toolkit-transforms[all] or python -m pip install data-prep-toolkit-transforms[ray, all] or python -m pip install data-prep-toolkit-transforms[language]

installing the python transforms will also install data-prep-toolkit

installing the ray transforms will also install data-prep-toolkit[ray]

Release notes:

1.1.1.dev1

Include all code transforms as extra [code]

1.1.1.dev0

Refactored code transforms (code_uality, code2parquet, header_cleanser, license select, proglang_select)
Added ml-filter and enrichment
renamed PDF2Parquet to Docling2Paruqet 

1.0.1.dev1

Added Gneissweb transforms
fdedup fix for windows

1.0.1.dev0

PR #979 (code_profiler)

1.0.0.a6

Added Profiler
Added Resize

1.0.0.a5

Added Pii Redactor
Relax fasttext requirement >= 0.9.2

1.0.0.a4

Added missing ray implementation for lang_id, doc_quality, tokenization and filter
Added ray notebooks for lang id, Doc Quality, tokenization, and Filter

1.0.0.a3

Added code_profiler

1.0.0.a2

Relax dependencies on pandas (use latest or whatever is installed by application) Relax dependencies on requests (use latest or whatever is installed by application)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file data_prep_toolkit_transforms-1.1.1.dev1-py3-none-any.whl.

File metadata

File hashes

Hashes for data_prep_toolkit_transforms-1.1.1.dev1-py3-none-any.whl
Algorithm Hash digest
SHA256 874c9f3cf111a35a93dc96dfd456d1425d8c672ba5d379d6e0f6b221fd0019c8
MD5 bfa61b00105898a8906b88912edc81c5
BLAKE2b-256 283787eb4ee25bb3957c99b6d16c5f97c141647164cb2003ebcaa6ae5ce3ed66

See more details on using hashes here.

File details

Details for the file data_prep_toolkit_transforms-1.1.1.dev1-4-py3-none-any.whl.

File metadata

File hashes

Hashes for data_prep_toolkit_transforms-1.1.1.dev1-4-py3-none-any.whl
Algorithm Hash digest
SHA256 b9183067277edec414a30bfeb6134e3cadb6f4d8366cfbf2e04e55a4415253e6
MD5 e968651ca1d5a22ad6710ade801d1c68
BLAKE2b-256 2215dc0655ed537a18a3b26b11ed47f29ff7236553d419cbb36202fffc2d2600

See more details on using hashes here.

File details

Details for the file data_prep_toolkit_transforms-1.1.1.dev1-3-py3-none-any.whl.

File metadata

File hashes

Hashes for data_prep_toolkit_transforms-1.1.1.dev1-3-py3-none-any.whl
Algorithm Hash digest
SHA256 a31f1897edf0efd074bb0634bde30c6177d768ca980fcd58db4b8cf84469b450
MD5 215dcb09cc0da0f4992a03d34f3c2ff7
BLAKE2b-256 7f990b2c11b766689a1abf861df9302e26f33d49ac56c1f3691f9f1003d98e52

See more details on using hashes here.

File details

Details for the file data_prep_toolkit_transforms-1.1.1.dev1-2-py3-none-any.whl.

File metadata

File hashes

Hashes for data_prep_toolkit_transforms-1.1.1.dev1-2-py3-none-any.whl
Algorithm Hash digest
SHA256 5bd46f1fd18f58aa7faa8cad12380404fd47c807608fcd3d33a0ad9ada63ec57
MD5 0d4ad2a2ee7611547d73ee3f85075a47
BLAKE2b-256 7dc5a2d4f473d83f2c7d1077bec1b126f40f5279f8159b0e4254bdc517af8e08

See more details on using hashes here.

File details

Details for the file data_prep_toolkit_transforms-1.1.1.dev1-1-py3-none-any.whl.

File metadata

File hashes

Hashes for data_prep_toolkit_transforms-1.1.1.dev1-1-py3-none-any.whl
Algorithm Hash digest
SHA256 8e6cac98a79eea47351f759da3f666ecdf02c4eaef581ff052fe71e321457738
MD5 1d01985350c5ca57867206eda1f9c3d4
BLAKE2b-256 c6c62f9650b957d3f1d95a20ab6242f40b3ef11e1d769291dba150429d677195

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page