Data Preparation Toolkit Transforms using Ray
Project description
DPK Python Transforms
installation
The transforms are delivered as a standard pyton library available on pypi and can be installed using pip install:
python -m pip install data-prep-toolkit-transforms[all]
or
python -m pip install data-prep-toolkit-transforms[ray, all]
or
python -m pip install data-prep-toolkit-transforms[language]
installing the python transforms will also install data-prep-toolkit
installing the ray transforms will also install data-prep-toolkit[ray]
Release notes:
1.1.1.dev1
Include all code transforms as extra [code]
1.1.1.dev0
Refactored code transforms (code_uality, code2parquet, header_cleanser, license select, proglang_select)
Added ml-filter and enrichment
renamed PDF2Parquet to Docling2Paruqet
1.0.1.dev1
Added Gneissweb transforms
fdedup fix for windows
1.0.1.dev0
PR #979 (code_profiler)
1.0.0.a6
Added Profiler
Added Resize
1.0.0.a5
Added Pii Redactor
Relax fasttext requirement >= 0.9.2
1.0.0.a4
Added missing ray implementation for lang_id, doc_quality, tokenization and filter
Added ray notebooks for lang id, Doc Quality, tokenization, and Filter
1.0.0.a3
Added code_profiler
1.0.0.a2
Relax dependencies on pandas (use latest or whatever is installed by application) Relax dependencies on requests (use latest or whatever is installed by application)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file data_prep_toolkit_transforms-1.1.2-py3-none-any.whl.
File metadata
- Download URL: data_prep_toolkit_transforms-1.1.2-py3-none-any.whl
- Upload date:
- Size: 80.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9e5be60c386eb3e16e0022ac35fc06543b4656dc912da17c3ee4f827c57fd257
|
|
| MD5 |
c27ee6f2b0f94e9d4e0578d778641909
|
|
| BLAKE2b-256 |
8557091ff2c454bf4701e64c0c9ba69024b43e5bdd5190d78cd78185be54d24e
|