Skip to main content

Clarifai PySpark Python SDK

Project description

ClarifaiPySpark

Introduction

This readme provides overview of the Software Development Kit (SDK) under development for integrating Clarifai with Databricks. The primary use case for this SDK is to facilitate the interaction between Databricks and Clarifai for tasks related to uploading client datasets, annotating data, and exporting and storing annotations in Spark DataFrames or Delta tables.

Screenshot 2023-11-17 at 5 21 04 PM

The initial use case for this SDK revolves around three main objectives:

Uploading Client Datasets into Clarifai App:

The SDK should enable the seamless upload of datasets into the Clarifai application, simplifying the process of data transfer from Databricks to Clarifai.

Annotate the Data:

It should provide features for data annotation, making it easier for users to add labels and metadata to their datasets within the Clarifai platform.

Export Annotations to Spark DataFrames/Delta Tables:

The SDK should offer functionality to export annotations and store them in Spark DataFrames or Delta tables, facilitating further data analysis within Databricks.

Requirements:

  • Databricks : Runtime 13.3 or later
  • Clarifai : pip install clarifai
  • Create your Clarifai account
  • Follow the instructions to get your own Clarifai PAT
  • Protocol Buffers : version 4.24.2 pip install protobuf==4.24.2

Setup:

Install the package and initialize the clarifaipyspark class to begin.

pip install clarifai-pyspark

Getting Started:

from clarifaipyspark.client import ClarifaiPySpark

Create a Clarifai-PySpark client object to connect to your app on Clarifai. You can also choose the dataset or create one in your clarifai app to upload the data.

claps_obj = ClarifaiPySpark(user_id=USER_ID, app_id=APP_ID, pat=CLARIFAI_PAT)
dataset_obj = claps_obj.dataset(dataset_id=DATASET_ID)

Examples:

Checkout these notebooks for various operations you can perform using clarifai-pyspark SDK.

Notebook Description GitHub
ClarifaiPyspark_Example_NB An extensive notebook which walks through the journey from data ingestion to exporting annotations GitHub
export_to_df_demo Explains the process of exporting annotations from clarifai app and storing it as dataframe in databricks GitHub

If you want to enhance your AI journey with workflows and leveraging custom models (programmatically) our Clarifai SDK might be good place to start with. Please refer below resources for further references.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clarifai-pyspark-0.0.4.tar.gz (11.8 kB view details)

Uploaded Source

Built Distribution

clarifai_pyspark-0.0.4-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file clarifai-pyspark-0.0.4.tar.gz.

File metadata

  • Download URL: clarifai-pyspark-0.0.4.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.1

File hashes

Hashes for clarifai-pyspark-0.0.4.tar.gz
Algorithm Hash digest
SHA256 ec186f0cff489969a92afa9ecd5cf1f43c6fabffbb3b91eb120bb7a7f7439787
MD5 565a595c86a96e2a06a6cb1a953d3be7
BLAKE2b-256 9c7c8b4edee4ad3c12bdef3308c59578c9d8615b80ae1adc31cb45f11e5f435b

See more details on using hashes here.

File details

Details for the file clarifai_pyspark-0.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for clarifai_pyspark-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 39abace7d009b4dcac5ab5e90897d9106b269a814e09fb84fc0b76fa1e36bbb8
MD5 478fddffb15d12c16699603d817413ac
BLAKE2b-256 345ddd1b22256368bf22511df740347f9f16237641f2e4642b9fc00abb800b56

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page