A package for interacting with the V7 platform via Pyspark
Project description
V7 DarwinPyspark
DarwinPyspark
is the official library for managing data in Databricks and seamlessly sending that data to be managed and used in V7. It provides an easy-to-use interface for interacting with V7's API, allowing users to upload and download datasets, as well as perform various operations on the data.
Features
- Upload data from a PySpark DataFrame to V7
- Download data from V7 and load it into a PySpark DataFrame
- Handle data registration, uploading, and confirmation with V7
- Efficiently manage large datasets and data exports
Installation
pip install darwinpyspark
Usage
This framework is designed to be used alongside our python SDK, see examples of darwin-py in our docs here.
To get started with DarwinPyspark, you'll first need to create a DarwinPyspark instance with your V7 API key, team slug, and dataset slug:
from darwinpyspark import DarwinPyspark
API_KEY = "your_api_key"
team_slug = "your_team_slug"
dataset_slug = "your_dataset_slug"
dp = DarwinPyspark(API_KEY, team_slug, dataset_slug)
Uploading Data
To upload a PySpark DataFrame to V7, use the upload_items method:
# Assume `df` is your PySpark DataFrame with columns 'object_url' and 'file_name'
dp.upload_items(df)
The upload_items method takes a PySpark DataFrame with columns 'object_url' (accessible open or presigned URL for the image) and 'file_name' (the name you want the file to be listed as in V7).
Now, users would interact with this data in the V7 platform - e.g. create ML workflows, annotate files such as images, videos, dicoms or test model performance etc.
Downloading Data
To download data from V7 as a PySpark DataFrame, use the download_export method:
export_name = "your_export_name"
export_df = dp.download_export(export_name)
License
DarwinPyspark is released under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file darwinpyspark-0.0.1.tar.gz
.
File metadata
- Download URL: darwinpyspark-0.0.1.tar.gz
- Upload date:
- Size: 2.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 574403c510de7fa9a3688643dab85f034745af641e732673516dd803dded8ba1 |
|
MD5 | 051d42a963db0bcb6caeea33ad8471b5 |
|
BLAKE2b-256 | d39d10c09da4b1b359c77205948e8246e73e88bd8d822adf6553d72df313de1b |
File details
Details for the file darwinpyspark-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: darwinpyspark-0.0.1-py3-none-any.whl
- Upload date:
- Size: 2.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8561b09d0e85844444ce0978ff9130e388a42ca5e03cfeb19c7384ed72e83214 |
|
MD5 | a483760af1d5bfd2f5ea0bc860c19072 |
|
BLAKE2b-256 | 735f10fffb0f7c669db8a05af22890209162292fab0486a708db893a906c8b12 |