Skip to main content

A package for interacting with the V7 platform via Pyspark

Project description

V7 DarwinPyspark

DarwinPyspark is the official library for managing data in Databricks and seamlessly sending that data to be managed and used in V7. It provides an easy-to-use interface for interacting with V7's API, allowing users to upload and download datasets, as well as perform various operations on the data.

Features

  • Upload data from a PySpark DataFrame to V7
  • Download data from V7 and load it into a PySpark DataFrame
  • Handle data registration, uploading, and confirmation with V7
  • Efficiently manage large datasets and data exports

Installation

pip install darwinpyspark

Usage

This framework is designed to be used alongside our python SDK, see examples of darwin-py in our docs here.

To get started with DarwinPyspark, you'll first need to create a DarwinPyspark instance with your V7 API key, team slug, and dataset slug:

from darwinpyspark import DarwinPyspark

API_KEY = "your_api_key"
team_slug = "your_team_slug"
dataset_slug = "your_dataset_slug"

dp = DarwinPyspark(API_KEY, team_slug, dataset_slug)

Uploading Data

To upload a PySpark DataFrame to V7, use the upload_items method:

# Assume `df` is your PySpark DataFrame with columns 'object_url' and 'file_name'
dp.upload_items(df)

The upload_items method takes a PySpark DataFrame with columns 'object_url' (accessible open or presigned URL for the image) and 'file_name' (the name you want the file to be listed as in V7).

Now, users would interact with this data in the V7 platform - e.g. create ML workflows, annotate files such as images, videos, dicoms or test model performance etc.

Downloading Data

To download data from V7 as a PySpark DataFrame, use the download_export method:

export_name = "your_export_name"
export_df = dp.download_export(export_name)

License

DarwinPyspark is released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

darwinpyspark-0.0.1.tar.gz (2.8 kB view details)

Uploaded Source

Built Distribution

darwinpyspark-0.0.1-py3-none-any.whl (2.9 kB view details)

Uploaded Python 3

File details

Details for the file darwinpyspark-0.0.1.tar.gz.

File metadata

  • Download URL: darwinpyspark-0.0.1.tar.gz
  • Upload date:
  • Size: 2.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.7

File hashes

Hashes for darwinpyspark-0.0.1.tar.gz
Algorithm Hash digest
SHA256 574403c510de7fa9a3688643dab85f034745af641e732673516dd803dded8ba1
MD5 051d42a963db0bcb6caeea33ad8471b5
BLAKE2b-256 d39d10c09da4b1b359c77205948e8246e73e88bd8d822adf6553d72df313de1b

See more details on using hashes here.

File details

Details for the file darwinpyspark-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for darwinpyspark-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8561b09d0e85844444ce0978ff9130e388a42ca5e03cfeb19c7384ed72e83214
MD5 a483760af1d5bfd2f5ea0bc860c19072
BLAKE2b-256 735f10fffb0f7c669db8a05af22890209162292fab0486a708db893a906c8b12

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page