Skip to main content

KFP components developed at Datatonic

Project description

Datatonic Pipeline Components

PyPI - Python Version PyPI PyPI - License

Datatonic Pipeline Components (DTPC) is a set of Kubeflow (KFP) components that can be run on any KFP pipeline execution backend. The components can be composed together into pipelines using the Kubeflow Pipelines SDK.

What is this?

Datatonic Pipeline Components is a library of reusable Kubeflow Pipeline components. These components have been designed and open sourced to make pipeline development easier and more enjoyable. The components are well tested and the containers on which they are built are scanned for vulnerabilities so that you can have confidence in their performance and security.

Installation

Install using pip:

pip install datatonic-pipeline-components

Components

Here we list out the components that are available via the library. See the How to use section for an example of using a component in a pipeline.

  • load_huggingface_torch - Loads a PyTorch model, its configuration, and its tokenizer from HuggingFace and saves the artifacts in Google Cloud Storage. Detailed Documentation.
  • load_huggingface_tensorflow - Loads a TensorFlow model, its configuration, and its tokenizer from HuggingFace and saves the artifacts in Google Cloud Storage. Detailed Documentation.
  • upload_pytorch_model - Takes as input a trained PyTorch model artifact stored in GCS and outputs a Vertex Model uploaded in Vertex Model Registry and ready to serve predictions in an Endpoint or via Batch Predictions. Detailed Documentation.
  • xgboost_shap_gpu - Computes feature attributions of an XGBoost model using GPU accelerated SHAP. Detailed Documentation.
  • gpt_tokenize - Generates a tokenised training and validation dataset from a given dataset. Detailed Documentation.

How to use

Include any components in the library in your pipelines using the following pattern:

from kfp.dsl import pipeline
import datatonic_pipeline_components as dtpc

@pipeline
def my_pipeline():
    dtpc.load_huggingface_tensorflow(
        model_class_name="TFAutoModel",
        config_name="AutoConfig",
        tokenizer_class_name="AutoTokenizer",
        model_name="bert-base-cased",
    )

The following figure illustrates that the library can be installed from PyPI, its components can be included in your pipeline code, and the container images upon which the components are built will be pulled from the corresponding dtpipelinecomponents dockerhub repository:

Cloud Architecture

Contributing

We are an open-source project and welcome contributions. This may be in the form of new components, bugfixes or better documentation.

See here for our guide on how to contribute.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datatonic_pipeline_components-1.1.1.tar.gz (613.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datatonic_pipeline_components-1.1.1-py3-none-any.whl (634.9 kB view details)

Uploaded Python 3

File details

Details for the file datatonic_pipeline_components-1.1.1.tar.gz.

File metadata

File hashes

Hashes for datatonic_pipeline_components-1.1.1.tar.gz
Algorithm Hash digest
SHA256 863f623a19e8b2efde1975da5cca0598f81ae199057bc63a4aab65385de5ea3d
MD5 ca85aa96b302cf7f960c72ecf04a9638
BLAKE2b-256 08217da7f63476d9a5be4aa1d96776081b458131af9d857c8f4fc56fa28c0c0f

See more details on using hashes here.

File details

Details for the file datatonic_pipeline_components-1.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for datatonic_pipeline_components-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 07682bb95595477490895061be352b09dc7e7280a8b471ca50b84f80d0cec93a
MD5 2b089be1fa603da4b4425bc9f5c9358f
BLAKE2b-256 c7648928f64874044e7421f26272cd8a2f3e9fed8d89270bd5c53ad81d5557af

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page