Skip to main content

A package for interacting with the V7 platform via Pyspark

Project description

V7 DarwinPyspark

DarwinPyspark is the official library for managing data in Databricks and seamlessly sending that data to be managed and used in V7. It provides an easy-to-use interface for interacting with V7's API, allowing users to upload and download datasets, as well as perform various operations on the data.

Features

  • Upload data from a PySpark DataFrame to V7
  • Download data from V7 and load it into a PySpark DataFrame
  • Handle data registration, uploading, and confirmation with V7
  • Efficiently manage large datasets and data exports

Installation

pip install darwinpyspark

Usage

This framework is designed to be used alongside our python SDK, see examples of darwin-py in our docs here.

To get started with DarwinPyspark, you'll first need to create a DarwinPyspark instance with your V7 API key, team slug, and dataset slug:

from darwinpyspark import DarwinPyspark

API_KEY = "your_api_key"
team_slug = "your_team_slug"
dataset_slug = "your_dataset_slug"

dp = DarwinPyspark(API_KEY, team_slug, dataset_slug)

Uploading Data

To upload a PySpark DataFrame to V7, use the upload_items method:

# Assume `df` is your PySpark DataFrame with columns 'object_url' and 'file_name'
dp.upload_items(df)

The upload_items method takes a PySpark DataFrame with columns 'object_url' (accessible open or presigned URL for the image) and 'file_name' (the name you want the file to be listed as in V7).

Now, users would interact with this data in the V7 platform - e.g. create ML workflows, annotate files such as images, videos, dicoms or test model performance etc.

Downloading Data

To download data from V7 as a PySpark DataFrame, use the download_export method:

export_name = "your_export_name"
export_df = dp.download_export(export_name)

License

DarwinPyspark is released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

darwinpyspark-0.0.2.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

darwinpyspark-0.0.2-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file darwinpyspark-0.0.2.tar.gz.

File metadata

  • Download URL: darwinpyspark-0.0.2.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.7

File hashes

Hashes for darwinpyspark-0.0.2.tar.gz
Algorithm Hash digest
SHA256 8a7e13ea48d03918194678e046603ecb50515f3c11a4b6498d70a4b61d8abe75
MD5 aed4135925d051502c5fe90700dad31b
BLAKE2b-256 a68257c88d9b2dc7fbdc2154f7d2e86cdf7f39e6bd7ad44da67aa9f599266645

See more details on using hashes here.

File details

Details for the file darwinpyspark-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for darwinpyspark-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e012cc747763ee21260c26d4b081ca912a17d074b6c09a483476ea319bcb63fd
MD5 1645cb63d0e240f9c4c7b2d2e4ab47b8
BLAKE2b-256 2438ac3dd42121370ad999e93eb4922ca47ef93c7151210f42d42bdca43167ac

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page