Skip to main content

A package for interacting with the V7 platform via Pyspark

Project description

V7 DarwinPyspark

DarwinPyspark is the official library for managing data in Databricks and seamlessly sending that data to be managed and used in V7. It provides an easy-to-use interface for interacting with V7's API, allowing users to upload and download datasets, as well as perform various operations on the data.

Features

  • Upload data from a PySpark DataFrame to V7
  • Download data from V7 and load it into a PySpark DataFrame
  • Handle data registration, uploading, and confirmation with V7
  • Efficiently manage large datasets and data exports

Installation

pip install darwinpyspark

Usage

This framework is designed to be used alongside our python SDK, see examples of darwin-py in our docs here.

To get started with DarwinPyspark, you'll first need to create a DarwinPyspark instance with your V7 API key, team slug, and dataset slug:

from darwinpyspark import DarwinPyspark

API_KEY = "your_api_key"
team_slug = "your_team_slug"
dataset_slug = "your_dataset_slug"

dp = DarwinPyspark(API_KEY, team_slug, dataset_slug)

Uploading Data

To upload a PySpark DataFrame to V7, use the upload_items method:

# Assume `df` is your PySpark DataFrame with columns 'object_url' and 'file_name'
dp.upload_items(df)

The upload_items method takes a PySpark DataFrame with columns 'object_url' (accessible open or presigned URL for the image) and 'file_name' (the name you want the file to be listed as in V7).

Now, users would interact with this data in the V7 platform - e.g. create ML workflows, annotate files such as images, videos, dicoms or test model performance etc.

Downloading Data

To download data from V7 as a PySpark DataFrame, use the download_export method:

export_name = "your_export_name"
export_df = dp.download_export(export_name)

License

DarwinPyspark is released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

darwinpyspark-0.0.2.tar.gz (4.9 kB view hashes)

Uploaded Source

Built Distribution

darwinpyspark-0.0.2-py3-none-any.whl (5.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page