Dataverse SDK For Python

Project description

Dataverse SDK For Python

Dataverse is a MLOPs platform for assisting in data selection, data visualization and model training in computer vision. Use Dataverse-SDK for Python to help you to interact with the Dataverse platform by Python. Currently, the library supports:

Create Project with your input ontology and sensors
Get Project by project-id
Create Dataset from your AWS/Azure storage or local
Get Dataset by dataset-id
List models for your selected project-id
Get and download your model

Package (PyPi) | Source code

Getting started

Install the package

pip install dataverse-sdk

Prerequisites: You must have an Dataverse Platform Account and Python 3.10+ to use this package.

Create the client

Interaction with the Dataverse site starts with an instance of the DataverseClient class. You need site url, an email-account and its password to instantiate the client object.

from dataverse_sdk import *
from dataverse_sdk.connections import get_connection
client = DataverseClient(
    host=DataverseHost.PRODUCTION, email="XXX", password="***", service_id="xxxx-xxxx-xx-xxx", alias="default", force = False
)
assert client is get_connection("default")

# Should provide different alias if you are trying to connect to different workspaces
client2 = DataverseClient(
    host=DataverseHost.PRODUCTION, email="account-2", password="***", service_id="xxxx-xxxx-xx-xxx", alias="client2", force = False
)
assert client2 is get_connection(client2.alias)

client3 = DataverseClient(
    host=DataverseHost.PRODUCTION, email="XXX", password="", service_id="xxxx-xxxx-xx-xxx", access_token="xxx"
)
assert client3 is get_connection(client3.alias)

Input arguments:

Argument name	Type/Options	Default	Description
host	str	＊--	the host url of the dataverse site (with curation port)
email	str	＊--	the email account of your dataverse workspace
password	str	＊--	the password of your dataverse workspace
service_id	str	＊--	The service id of the dataverse you want to connect
alias	str	'default'	the connection alias of your dataverse client
force	bool	False	whether force to replace the connection if the given alias exists
access_token	str	None	instead of password to do authentication

Key concepts

Once you've initialized a DataverseClient, you can interact with Dataverse from the initialized object.

Examples

The following sections provide examples for the most common DataVerse tasks including:

Get User
List Projects
Create Project
Get Project
Edit Project
Update Alias
Create Dataset
Get Dataset
List Models
Get and Download Model

Get User

The get_user method is to list the current user info. You can get the detail info, such as role, permission and user detail.

user = client.get_user()

List Projects

The list_projects method will list all projects of the given sites.

Example Usage:

projects = client.list_projects(current_user = True,
                                exclude_sensor_type=SensorType.LIDAR,
                                image_type= OntologyImageType._2D_BOUNDING_BOX)

Input arguments:

Argument name	Type/Options	Default	Description
current_user	bool	True	only show the projects of current user
exclude_sensor_type	SensorType.CAMERA SensorType.LIDAR	None	exclude the projects with the given sensor type
image_type	OntologyImageType._2D_BOUNDING_BOX OntologyImageType.SEMANTIC_SEGMENTATION OntologyImageType.CLASSIFICATION OntologyImageType.POINT OntologyImageType.POLYGON OntologyImageType.POLYLINE	None	only include the projects with the given image type

Create Project

The create_project method will create project on the connected site with the defined ontology and sensors.

Example Usage:

# 1) Create ontology with ontologyclass object
ontology = Ontology(
    name="sample ontology",
    image_type=OntologyImageType._2D_BOUNDING_BOX,
    pcd_type = None,
    classes=[
        OntologyClass(name="Pedestrian", rank=1, color="#234567"),
        OntologyClass(name="Truck", rank=2, color="#345678"),
        OntologyClass(name="Car", rank=3, color="#456789"),
        OntologyClass(name="Cyclist", rank=4, color="#567890"),
        OntologyClass(name="DontCare", rank=5, color="#6789AB"),
        OntologyClass(name="Misc", rank=6, color="#789AB1"),
        OntologyClass(name="Van", rank=7, color="#89AB12"),
        OntologyClass(name="Tram", rank=8, color="#9AB123"),
        OntologyClass(name="Person_sitting", rank=9, color="#AB1234"),
    ],
)

For project with camera sensor, there would be only one image_type for one project. You could choose from [OntologyImageType._2D_BOUNDING_BOX, OntologyImageType.SEMANTIC_SEGMENTATION, OntologyImageType.CLASSIFICATION, OntologyImageType.POINT, OntologyImageType.POLYGON, OntologyImageType.POLYLINE].

For project with lidar sensor, your should assign pcd_type = OntologyPcdType.CUBOID for the ontology.

# 2) Create your sensor list with name / SensorType
sensors = [
    Sensor(name="camera1", type=SensorType.CAMERA),
    Sensor(name="lidar1", type=SensorType.LIDAR),
]

# 3) Create your project tag attributes (Optional)
project_tag = ProjectTag(
    attributes=[
        {"name": "year", "type": "number"},
        {
            "name": "unknown_object",
            "type": "option",
            "options": [{"value": "fire"}, {"value": "leaves"}, {"value": "water"}],
        },
    ]
)

# 4) Create your project with your ontology/sensors/project_tag
project = client.create_project(name="Sample project", ontology=ontology, sensors=sensors, project_tag=project_tag)

Input arguments for creating project:

Argument name	Type/Options	Default	Description
name	str	*--	name of your project
ontology	Ontology	*--	the Ontology basemodel data of current project
sensors	list[Sensor]	*--	the list of Sensor basemodel data of your project
project_tag	ProjectTag	None	your project tags
description	str	None	your project description

＊--: required argument without default

Check https://linkervision.gitbook.io/dataverse/data-management/project-ontology for the detail of Project Ontology

Get Project

The get_proejct method retrieves the project from the connected site. The project_id parameter is the unique integer ID of the project, not its "name" property.

project = client.get_project(project_id= 1, client_alias=client.alias) # if client_alias is not provided, we'll get it from client

Edit Project

For editing project contents, we have four functions below for add/edit project tag and ontology classes.

Add New Project Tags

Note: Can not create existing project tag!

tag = {
        "attributes": [
            {
                "name": "month",
                "type": "number"
            },
            {
                "name": "weather",
                "type": "option",
                "options": [{"value":"sunny"}, {"value":"rainy"}, {"value":"cloudy"}
                ]
            }]}
project_tag= ProjectTag(**tag)
#should provided client_alias if calling from client
client.add_project_tag(project_id = 10, project_tag=project_tag, client_alias=client.alias)
#OR
project.add_project_tag(project_tag=project_tag)

Edit Project Tags

** Note:

Can not edit project tag that does not exist
Can not modify the data type of existing project tags
Can not provide attributes with existing options

tag = {
        "attributes": [
            {
                "name": "weather",
                "type": "option",
                "options": [{"value":"unknown"}, {"value":"snowy"}
                ]
            }]}
project_tag= ProjectTag(**tag)
#should provided client_alias if calling from client
client.edit_project_tag(project_id = 10, project_tag=project_tag, client_alias=client.alias)
#OR
project.edit_project_tag(project_tag=project_tag)

Add New Ontology Classes

Note: Can not add existing ontology class!

new_classes = [OntologyClass(name="obstruction",
                    rank=9,
                    color="#AB4321",
                    attributes=[{
                    "name":
                    "status",
                    "type":
                    "option",
                    "options": [{
                    "value": "static"}, {"value": "moving"
                    }]}])]
#should provided client_alias if calling from client
client.add_ontology_classes(project_id=24, ontology_classes=new_classes, client_alias=client.alias)
#OR
project.add_ontology_classes(ontology_classes=new_classes)

Edit Ontology Classes

** Note:

Can not edit ontology class that does not exist
Can not modify the data type of existing ontology class attributes
Can not provide attributes with existing options

edit_classes = [OntologyClass(name="obstruction",
                    color="#AB4321",
                    attributes=[{
                    "name":
                    "status",
                    "type":
                    "option",
                    "options": [{
                    "value": "unknown"}]}])]
#should provided client_alias if calling from client
client.edit_ontology_classes(project_id=24, ontology_classes=edit_classes, client_alias=client.alias)
#OR
project.edit_ontology_classes(ontology_classes=edit_classes)

Update Ontology Alias

Get the csv file of alias map for your project

client.generate_alias_map(project_id=123, alias_file_path="./alias.csv")

Fill the alias in the csv file and save (DO NOT modify other fields)
Update alias for your project with the alias file path

client.update_alias(project_id=123, alias_file_path= "/Users/Downloads/alias.csv" )

Create Dataset

Use `create_dataset` to import dataset from cloud storage

dataset_data = {
    "name": "Dataset 1",
    "data_source": DataSource.Azure/DataSource.AWS,
    "storage_url": "storage/url",
    "container_name": "azure container name",
    "data_folder": "datafolder/to/vai_anno",
    "sensors": project.sensors,
    "type": DatasetType.ANNOTATED_DATA,
    "annotation_format": AnnotationFormat.VISION_AI,
    "annotations": ["groundtruth"],
    "sequential": False,
    "render_pcd": False,
    "generate_metadata": False,
    "auto_tagging": ["timeofday"],
    "sas_token": "azure sas token",  # only for azure storage
    "access_key_id" : "aws s3 access key id",# only for private s3 bucket, don't need to assign it in case of public s3 bucket or azure data source
    "secret_access_key": "aws s3 secret access key"# only for private s3 bucket, don't need to assign it in case of public s3 bucket or azure data source
}
dataset = project.create_dataset(**dataset_data)

Input arguments for creating dataset from cloud storage:

Argument name	Type/Options	Default	Description
name	str	＊--	name of your dataset
data_source	DataSource.Azure DataSource.AWS	＊--	the datasource of your dataset
storage_url	str	＊--	your cloud storage url
container_name	str	None	azure container name
data_folder	str	＊--	the relative data folder from the storage_url and container
sensors	list[Sensor]	＊--	the list of Sensor of your dataset (one or more from project specified sensors)
type	DatasetType.ANNOTATED_DATA DatasetType.RAW_DATA	＊--	your dataset type (annotated or raw data)
annotation_format	AnnotationFormat.VISION_AI AnnotationFormat.KITTI AnnotationFormat.COCO AnnotationFormat.IMAGE	＊--	the format of your annotation data
annotations	list[str]	None	list of names for your annotation data folders, such as ["groundtruth"]
sequential	bool	False	data is sequential or not
render_pcd	bool	False	render pcd preview image or not
generate_metadata	bool	False	generate image meta data or not
auto_tagging	list	None	generate auto_tagging with target models `["weather", "scene", "timeofday"]`
description	str	None	your dataset description
sas_token	str	None	SAStoken for azure container
access_key_id	str	None	access key id for AWS private s3 bucket
secret_access_key	str	None	secret access key for AWS private s3 bucket

＊--: required argument without default

Check https://linkervision.gitbook.io/dataverse/data-management/import-dataset for the detail of Import Dataset.

Use `create_dataset` to import dataset from `LOCAL`

dataset_data2 = {
    "name": "dataset-local-upload",
    "data_source": DataSource.LOCAL,
    "storage_url": "",
    "container_name": "",
    "data_folder": "/YOUR/TARGET/LOCAL/FOLDER",
    "sensors": project.sensors,
    "type": DatasetType.ANNOTATED_DATA, # or DatasetType.RAW_DATA for images
    "annotation_format": AnnotationFormat.VISION_AI,
    "annotations": ["groundtruth"],
    "sequential": False,
    "generate_metadata": False,
    "auto_tagging": []
}
dataset2 = project.create_dataset(**dataset_data2)

Your could also use the script for importing dataset from local

python tools/import_dataset_from_local.py -host https://staging.visionai.linkervision.ai/dataverse/curation -e {your-account-email} -p {PASSWORD} -s {service-id}  -project {project-id} --folder {/YOUR/TARGET/LOCAL/FOLDER} -name {dataset-name} -type {raw_data OR annotated_data} -anno {image OR vision_ai} --sequential

Get Dataset

The get_dataset method retrieves the dataset info from the connected site. The dataset_id parameter is the unique integer ID of the dataset, not its "name" property.

dataset = client.get_dataset(dataset_id=5)

List Models

The list_models method will list all the models in the given project

#1
models = client.list_models(project_id = 1, client_alias=client.alias)
#2
project = client.get_project(project_id=1)
models = project.list_models()

Get Model

The get_model method will get the model detail info by the given model-id

model = client.get_model(model_id=30, client_alias=client.alias)
model = project.get_model(model_id=30)

From the given model, we could get the model convert records as below

model_record = client.get_convert_record(convert_record_id=1, client_alias=client.alias)
OR
model_record = model.get_convert_record(convert_record_id=1)

Troubleshooting

Next steps

Contributing

Links to language repos

Python Readme

Project details

Release history Release notifications | RSS feed

1.5.3

Oct 23, 2024

1.5.2

Oct 17, 2024

This version

1.5.1

Oct 15, 2024

1.5.0

Sep 26, 2024

1.4.1

Aug 2, 2024

1.4.0

Jul 8, 2024

1.3.2

Jun 20, 2024

1.3.1

May 29, 2024

1.3.0

May 22, 2024

1.2.1

May 21, 2024

1.1.0

Jan 18, 2024

1.0.0

Dec 21, 2023

0.5.0

Dec 4, 2023

0.4.1

Nov 28, 2023

0.4.0

Nov 20, 2023

0.3.2

Oct 26, 2023

0.3.1

Sep 7, 2023

0.3.0

Sep 7, 2023

0.2.1

Aug 9, 2023

0.2.0

Jul 10, 2023

0.1.5

Jun 17, 2023

0.1.4

Jun 13, 2023

0.1.3

May 24, 2023

0.1.2

May 19, 2023

0.1.1

Jan 18, 2023

0.1.0

Nov 30, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataverse_sdk-1.5.1.tar.gz (25.1 kB view hashes)

Uploaded Oct 15, 2024 Source

Built Distribution

dataverse_sdk-1.5.1-py3-none-any.whl (24.2 kB view hashes)

Uploaded Oct 15, 2024 Python 3

Hashes for dataverse_sdk-1.5.1.tar.gz

Hashes for dataverse_sdk-1.5.1.tar.gz
Algorithm	Hash digest
SHA256	`cb11ffb09f2305e2d4c4aaa7267e5fbc038709600e4ab7a67d186663cff02935`
MD5	`0c4c2ec3a79599a99f75c163fb039072`
BLAKE2b-256	`7487bb0ad38cd5c253f821c07dfb0a134c0b73a527b641d1ca0a2bdb7b67dee1`

Hashes for dataverse_sdk-1.5.1-py3-none-any.whl

Hashes for dataverse_sdk-1.5.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4708fb2169ca002957cd241e6e7731ace4655a62843b63cb897e83be95475996`
MD5	`5ff88a79d361662cc12efe2faa718b62`
BLAKE2b-256	`3fe38c8cea456c43a48becb98e31ee9f382b9288f031d44d936a98a516b19c4a`

dataverse-sdk 1.5.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Dataverse SDK For Python

Getting started

Install the package

Create the client

Key concepts

Examples

Get User

List Projects

Create Project

Get Project

Edit Project

Add New Project Tags

Edit Project Tags

Add New Ontology Classes

Edit Ontology Classes

Update Ontology Alias

Create Dataset

Use create_dataset to import dataset from cloud storage

Use create_dataset to import dataset from LOCAL

Get Dataset

List Models

Get Model

Troubleshooting

Next steps

Contributing

Links to language repos

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

Use `create_dataset` to import dataset from cloud storage

Use `create_dataset` to import dataset from `LOCAL`