Library and command line interface for darwin.v7labs.com
Project description
V7 Darwin Python SDK
⚡️ Official library to annotate, manage datasets, and models on V7's Darwin Training Data Platform. ⚡️
Darwin-py can both be used from the command line and as a python library.
Main functions are (but not limited to):
- Client authentication
- Listing local and remote datasets
- Create/remove datasets
- Upload/download data to/from remote datasets
- Direct integration with PyTorch dataloaders
- Extracting video artifacts
Support tested for python 3.9 - 3.12
🏁 Installation
pip install darwin-py
You can now type darwin in your terminal and access the command line interface.
If you wish to use the PyTorch bindings, then you can use the ml flag to install all the additional requirements
pip install darwin-py[ml]
If you wish to use video frame extraction, then you can use the ocv flag to install all the additional requirements
pip install darwin-py[ocv]
If you wish to use video artifacts extraction, then you need to install FFmpeg
To run test, first install the test extra package
pip install darwin-py[test]
Configuration
Retry Configuration
The SDK includes a retry mechanism for handling API rate limits (429) and server errors (500, 502, 503, 504). You can configure the retry behavior using the following environment variables:
DARWIN_RETRY_INITIAL_WAIT: Initial wait time in seconds between retries (default: 60)DARWIN_RETRY_MAX_WAIT: Maximum wait time in seconds between retries (default: 300)DARWIN_RETRY_MAX_ATTEMPTS: Maximum number of retry attempts (default: 10)
Example configuration:
# Configure shorter retry intervals and fewer attempts
export DARWIN_RETRY_INITIAL_WAIT=30
export DARWIN_RETRY_MAX_WAIT=120
export DARWIN_RETRY_MAX_ATTEMPTS=5
The retry mechanism will automatically handle:
- Rate limiting (HTTP 429)
- Server errors (HTTP 500, 502, 503, 504)
For each retry attempt, you'll see a message indicating the type of error and the wait time before the next attempt.
Development
See our development and QA environment installation recommendations here
Usage as a Command Line Interface (CLI)
Here you can find V7 labs doc on the CLI usage
Once installed, darwin is accessible as a command line tool.
A useful way to navigate the CLI usage is through the help command -h/--help which will
provide additional information for each command available.
Client Authentication
To perform remote operations on Darwin you first need to authenticate. This requires a team-specific API-key. If you do not already have a Darwin account, you can contact us and we can set one up for you.
To start the authentication process:
$ darwin authenticate
API key:
Make example-team the default team? [y/N] y
Datasets directory [~/.darwin/datasets]:
Authentication succeeded.
You will be then prompted to enter your API-key, whether you want to set the corresponding team as
default and finally the desired location on the local file system for the datasets of that team.
This process will create a configuration file at ~/.darwin/config.yaml.
This file will be updated with future authentications for different teams.
Listing local and remote datasets
Lists a summary of local existing datasets
$ darwin dataset local
NAME IMAGES SYNC_DATE SIZE
mydataset 112025 yesterday 159.2 GB
Lists a summary of remote datasets accessible by the current user.
$ darwin dataset remote
NAME IMAGES PROGRESS
example-team/mydataset 112025 73.0%
Create/remove a dataset
To create an empty dataset remotely:
$ darwin dataset create test
Dataset 'test' (example-team/test) has been created.
Access at https://darwin.v7labs.com/datasets/579
The dataset will be created in the team you're authenticated for.
To delete the project on the server:
$ darwin dataset remove test
About to delete example-team/test on darwin.
Do you want to continue? [y/N] y
Upload/download data to/from a remote dataset
Uploads data to an existing remote project. It takes the dataset name and a single image (or directory) with images/videos to upload as parameters.
The -e/--exclude argument allows to indicate file extension/s to be ignored from the data_dir.
e.g.: -e .jpg
For videos, the frame rate extraction rate can be specified by adding --fps <frame_rate>
Supported extensions:
- Video files: [
.mp4,.bpm,.mov,.avi,.mkv,.hevc,.pdf,.dcm,.nii,.nii.gz,.ndpi,.rvgformats]. - Image files [
.jpg,.jpeg,.png,.jfif,.tif,.tiff,.qtiff,.bmp,.svs,.webp,.JPEG,.JPG,.BMPformats].
$ darwin dataset push test /path/to/folder/with/images
100%|████████████████████████| 2/2 [00:01<00:00, 1.27it/s]
Before a dataset can be downloaded, a release needs to be generated:
$ darwin dataset export test 0.1
Dataset test successfully exported to example-team/test:0.1
This version is immutable, if new images / annotations have been added you will have to create a new release to included them.
To list all available releases
$ darwin dataset releases test
NAME IMAGES CLASSES EXPORT_DATE
example-team/test:0.1 4 0 2019-12-07 11:37:35+00:00
And to finally download a release.
$ darwin dataset pull test:0.1
Dataset example-team/test:0.1 downloaded at /directory/choosen/at/authentication/time .
Usage as a Python library
Here you can find V7 labs doc on the usage as Python library
The framework is designed to be usable as a standalone python library.
Usage can be inferred from looking at the operations performed in darwin/cli_functions.py.
A minimal example to download a dataset is provided below and a more extensive one can be found in
from darwin.client import Client
client = Client.local() # use the configuration in ~/.darwin/config.yaml
dataset = client.get_remote_dataset("example-team/test")
dataset.pull() # downloads annotations and images for the latest exported version
Follow this guide for how to integrate darwin datasets directly in PyTorch.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file darwin_py-3.4.3.tar.gz.
File metadata
- Download URL: darwin_py-3.4.3.tar.gz
- Upload date:
- Size: 262.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b7d70f3726c1fee5d982c1b7ec560a753fd96ff2ee9f8c0dcbc36b839b8ebbf2
|
|
| MD5 |
3aabc5b24ef1c698bff52df0aa2b985a
|
|
| BLAKE2b-256 |
88b49439ddf241c4c6514aba8931cd7aa285b6bf26ee9c03ad95a34cd353e9aa
|
Provenance
The following attestation bundles were made for darwin_py-3.4.3.tar.gz:
Publisher:
EVENT_release.yml on v7labs/darwin-py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
darwin_py-3.4.3.tar.gz -
Subject digest:
b7d70f3726c1fee5d982c1b7ec560a753fd96ff2ee9f8c0dcbc36b839b8ebbf2 - Sigstore transparency entry: 837387546
- Sigstore integration time:
-
Permalink:
v7labs/darwin-py@65d2f049bb2f07b9dd8a8d55cf2940e0d2980692 -
Branch / Tag:
refs/tags/v3.4.3 - Owner: https://github.com/v7labs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
EVENT_release.yml@65d2f049bb2f07b9dd8a8d55cf2940e0d2980692 -
Trigger Event:
release
-
Statement type:
File details
Details for the file darwin_py-3.4.3-py3-none-any.whl.
File metadata
- Download URL: darwin_py-3.4.3-py3-none-any.whl
- Upload date:
- Size: 351.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d26d3f0bfda890562214851bec8efe2db5e169dc9f65b2243e2e0441d1fca464
|
|
| MD5 |
15d2bc62609e1b71cd18e5e61dc5ed62
|
|
| BLAKE2b-256 |
0d5c9ab218510ba5680bee429d5f641bbab1861c7e6b94801166fa5a0cbb0d17
|
Provenance
The following attestation bundles were made for darwin_py-3.4.3-py3-none-any.whl:
Publisher:
EVENT_release.yml on v7labs/darwin-py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
darwin_py-3.4.3-py3-none-any.whl -
Subject digest:
d26d3f0bfda890562214851bec8efe2db5e169dc9f65b2243e2e0441d1fca464 - Sigstore transparency entry: 837387604
- Sigstore integration time:
-
Permalink:
v7labs/darwin-py@65d2f049bb2f07b9dd8a8d55cf2940e0d2980692 -
Branch / Tag:
refs/tags/v3.4.3 - Owner: https://github.com/v7labs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
EVENT_release.yml@65d2f049bb2f07b9dd8a8d55cf2940e0d2980692 -
Trigger Event:
release
-
Statement type: