KFP components developed at Datatonic
Project description
Datatonic Pipeline Components
Datatonic Pipeline Components (DTPC) is a set of Kubeflow (KFP) components that can be run on any KFP pipeline execution backend. The components can be composed together into pipelines using the Kubeflow Pipelines SDK.
What is this?
Datatonic Pipeline Components is a library of reusable Kubeflow Pipeline components. These components have been designed and open sourced to make pipeline development easier and more enjoyable. The components are well tested and the containers on which they are built are scanned for vulnerabilities so that you can have confidence in their performance and security.
Installation
Install using pip:
pip install datatonic-pipeline-components
Components
Here we list out the components that are available via the library. See the How to use section for an example of using a component in a pipeline.
- load_huggingface_torch - Loads a PyTorch model, its configuration, and its tokenizer from HuggingFace and saves the artifacts in Google Cloud Storage. Detailed Documentation.
- load_huggingface_tensorflow - Loads a TensorFlow model, its configuration, and its tokenizer from HuggingFace and saves the artifacts in Google Cloud Storage. Detailed Documentation.
- upload_pytorch_model - Takes as input a trained PyTorch model artifact stored in GCS and outputs a Vertex Model uploaded in Vertex Model Registry and ready to serve predictions in an Endpoint or via Batch Predictions. Detailed Documentation.
- xgboost_shap_gpu - Computes feature attributions of an XGBoost model using GPU accelerated SHAP. Detailed Documentation.
- gpt_tokenize - Generates a tokenised training and validation dataset from a given dataset. Detailed Documentation.
How to use
Include any components in the library in your pipelines using the following pattern:
from kfp.dsl import pipeline
import datatonic_pipeline_components as dtpc
@pipeline
def my_pipeline():
dtpc.load_huggingface_tensorflow(
model_class_name="TFAutoModel",
config_name="AutoConfig",
tokenizer_class_name="AutoTokenizer",
model_name="bert-base-cased",
)
The following figure illustrates that the library can be installed from PyPI, its components can be included in your pipeline code, and the container images upon which the components are built will be pulled from the corresponding dtpipelinecomponents dockerhub repository:
Contributing
We are an open-source project and welcome contributions. This may be in the form of new components, bugfixes or better documentation.
See here for our guide on how to contribute.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for datatonic_pipeline_components-1.1.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 863f623a19e8b2efde1975da5cca0598f81ae199057bc63a4aab65385de5ea3d |
|
MD5 | ca85aa96b302cf7f960c72ecf04a9638 |
|
BLAKE2b-256 | 08217da7f63476d9a5be4aa1d96776081b458131af9d857c8f4fc56fa28c0c0f |
Hashes for datatonic_pipeline_components-1.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 07682bb95595477490895061be352b09dc7e7280a8b471ca50b84f80d0cec93a |
|
MD5 | 2b089be1fa603da4b4425bc9f5c9358f |
|
BLAKE2b-256 | c7648928f64874044e7421f26272cd8a2f3e9fed8d89270bd5c53ad81d5557af |