Skip to main content

Dataverse SDK For Python

Project description

Dataverse SDK For Python

Dataverse is a MLOPs platform for assisting in data selection, data visualization and model training in computer vision. Use Dataverse-SDK for Python to help you to interact with the Dataverse platform by Python. Currently, the library supports:

  • Create Project with your input ontology and sensors
  • Get Project by project-id
  • Create Dataset from your AWS/Azure storage or local
  • Get Dataset by dataset-id
  • List models for your selected project-id
  • Get and download your model

Package (PyPi) | Source code

Getting started

Install the package

pip install dataverse-sdk

Prerequisites: You must have an Dataverse Platform Account and Python 3.10+ to use this package.

Create the client

Interaction with the Dataverse site starts with an instance of the DataverseClient class. You need site url, an email-account and its password to instantiate the client object.

from dataverse_sdk import *
from dataverse_sdk.connections import get_connection
client = DataverseClient(
    host=DataverseHost.PRODUCTION, email="XXX", password="***", service_id="xxxx-xxxx-xx-xxx", alias="default", force = False
)
assert client is get_connection("default")

# Should provide different alias if you are trying to connect to different workspaces
client2 = DataverseClient(
    host=DataverseHost.PRODUCTION, email="account-2", password="***", service_id="xxxx-xxxx-xx-xxx", alias="client2", force = False
)
assert client2 is get_connection(client2.alias)

client3 = DataverseClient(
    host=DataverseHost.PRODUCTION, email="XXX", password="", service_id="xxxx-xxxx-xx-xxx", access_token="xxx"
)
assert client3 is get_connection(client3.alias)
  • Input arguments:
Argument name Type/Options Default Description
host str *-- the host url of the dataverse site (with curation port)
email str *-- the email account of your dataverse workspace
password str *-- the password of your dataverse workspace
service_id str *-- The service id of the dataverse you want to connect
alias str 'default' the connection alias of your dataverse client
force bool False whether force to replace the connection if the given alias exists
access_token str None instead of password to do authentication

Key concepts

Once you've initialized a DataverseClient, you can interact with Dataverse from the initialized object.

Examples

The following sections provide examples for the most common DataVerse tasks including:

Get User

The get_user method is to list the current user info. You can get the detail info, such as role, permission and user detail.

user = client.get_user()

List Projects

The list_projects method will list all projects of the given sites.

  • Example Usage:
projects = client.list_projects(current_user = True,
                                exclude_sensor_type=SensorType.LIDAR,
                                image_type= OntologyImageType._2D_BOUNDING_BOX)
  • Input arguments:
Argument name Type/Options Default Description
current_user bool True only show the projects of current user
exclude_sensor_type SensorType.CAMERA
SensorType.LIDAR
None exclude the projects with the given sensor type
image_type OntologyImageType._2D_BOUNDING_BOX
OntologyImageType.SEMANTIC_SEGMENTATION
OntologyImageType.CLASSIFICATION
OntologyImageType.POINT
OntologyImageType.POLYGON
OntologyImageType.POLYLINE
None only include the projects with the given image type

Create Project

The create_project method will create project on the connected site with the defined ontology and sensors.

  • Example Usage:
# 1) Create ontology with ontologyclass object
ontology = Ontology(
    name="sample ontology",
    image_type=OntologyImageType._2D_BOUNDING_BOX,
    pcd_type = None,
    classes=[
        OntologyClass(name="Pedestrian", rank=1, color="#234567"),
        OntologyClass(name="Truck", rank=2, color="#345678"),
        OntologyClass(name="Car", rank=3, color="#456789"),
        OntologyClass(name="Cyclist", rank=4, color="#567890"),
        OntologyClass(name="DontCare", rank=5, color="#6789AB"),
        OntologyClass(name="Misc", rank=6, color="#789AB1"),
        OntologyClass(name="Van", rank=7, color="#89AB12"),
        OntologyClass(name="Tram", rank=8, color="#9AB123"),
        OntologyClass(name="Person_sitting", rank=9, color="#AB1234"),
    ],
)

For project with camera sensor, there would be only one image_type for one project. You could choose from [OntologyImageType._2D_BOUNDING_BOX, OntologyImageType.SEMANTIC_SEGMENTATION, OntologyImageType.CLASSIFICATION, OntologyImageType.POINT, OntologyImageType.POLYGON, OntologyImageType.POLYLINE].

For project with lidar sensor, your should assign pcd_type = OntologyPcdType.CUBOID for the ontology.

# 2) Create your sensor list with name / SensorType
sensors = [
    Sensor(name="camera1", type=SensorType.CAMERA),
    Sensor(name="lidar1", type=SensorType.LIDAR),
]

# 3) Create your project tag attributes (Optional)
project_tag = ProjectTag(
    attributes=[
        {"name": "year", "type": "number"},
        {
            "name": "unknown_object",
            "type": "option",
            "options": [{"value": "fire"}, {"value": "leaves"}, {"value": "water"}],
        },
    ]
)

# 4) Create your project with your ontology/sensors/project_tag
project = client.create_project(name="Sample project", ontology=ontology, sensors=sensors, project_tag=project_tag)
  • Input arguments for creating project:
Argument name Type/Options Default Description
name str *-- name of your project
ontology Ontology *-- the Ontology basemodel data of current project
sensors list[Sensor] *-- the list of Sensor basemodel data of your project
project_tag ProjectTag None your project tags
description str None your project description

*--: required argument without default


Get Project

The get_proejct method retrieves the project from the connected site. The project_id parameter is the unique integer ID of the project, not its "name" property.

project = client.get_project(project_id= 1, client_alias=client.alias) # if client_alias is not provided, we'll get it from client

Edit Project

For editing project contents, we have four functions below for add/edit project tag and ontology classes.

Add New Project Tags

  • Note: Can not create existing project tag!
tag = {
        "attributes": [
            {
                "name": "month",
                "type": "number"
            },
            {
                "name": "weather",
                "type": "option",
                "options": [{"value":"sunny"}, {"value":"rainy"}, {"value":"cloudy"}
                ]
            }]}
project_tag= ProjectTag(**tag)
#should provided client_alias if calling from client
client.add_project_tag(project_id = 10, project_tag=project_tag, client_alias=client.alias)
#OR
project.add_project_tag(project_tag=project_tag)

Edit Project Tags

** Note:

  1. Can not edit project tag that does not exist
  2. Can not modify the data type of existing project tags
  3. Can not provide attributes with existing options
tag = {
        "attributes": [
            {
                "name": "weather",
                "type": "option",
                "options": [{"value":"unknown"}, {"value":"snowy"}
                ]
            }]}
project_tag= ProjectTag(**tag)
#should provided client_alias if calling from client
client.edit_project_tag(project_id = 10, project_tag=project_tag, client_alias=client.alias)
#OR
project.edit_project_tag(project_tag=project_tag)

Add New Ontology Classes

  • Note: Can not add existing ontology class!
new_classes = [OntologyClass(name="obstruction",
                    rank=9,
                    color="#AB4321",
                    attributes=[{
                    "name":
                    "status",
                    "type":
                    "option",
                    "options": [{
                    "value": "static"}, {"value": "moving"
                    }]}])]
#should provided client_alias if calling from client
client.add_ontology_classes(project_id=24, ontology_classes=new_classes, client_alias=client.alias)
#OR
project.add_ontology_classes(ontology_classes=new_classes)

Edit Ontology Classes

** Note:

  1. Can not edit ontology class that does not exist
  2. Can not modify the data type of existing ontology class attributes
  3. Can not provide attributes with existing options
edit_classes = [OntologyClass(name="obstruction",
                    color="#AB4321",
                    attributes=[{
                    "name":
                    "status",
                    "type":
                    "option",
                    "options": [{
                    "value": "unknown"}]}])]
#should provided client_alias if calling from client
client.edit_ontology_classes(project_id=24, ontology_classes=edit_classes, client_alias=client.alias)
#OR
project.edit_ontology_classes(ontology_classes=edit_classes)

Update Ontology Alias

  1. Get the csv file of alias map for your project
client.generate_alias_map(project_id=123, alias_file_path="./alias.csv")
  1. Fill the alias in the csv file and save (DO NOT modify other fields)

  2. Update alias for your project with the alias file path

client.update_alias(project_id=123, alias_file_path= "/Users/Downloads/alias.csv" )

Create Dataset

Use create_dataset to import dataset from cloud storage

dataset_data = {
    "name": "Dataset 1",
    "data_source": DataSource.Azure/DataSource.AWS,
    "storage_url": "storage/url",
    "container_name": "azure container name",
    "data_folder": "datafolder/to/vai_anno",
    "sensors": project.sensors,
    "type": DatasetType.ANNOTATED_DATA,
    "annotation_format": AnnotationFormat.VISION_AI,
    "annotations": ["groundtruth"],
    "sequential": False,
    "render_pcd": False,
    "generate_metadata": False,
    "auto_tagging": ["timeofday"],
    "sas_token": "azure sas token",  # only for azure storage
    "access_key_id" : "aws s3 access key id",# only for private s3 bucket, don't need to assign it in case of public s3 bucket or azure data source
    "secret_access_key": "aws s3 secret access key"# only for private s3 bucket, don't need to assign it in case of public s3 bucket or azure data source
}
dataset = project.create_dataset(**dataset_data)
  • Input arguments for creating dataset from cloud storage:
Argument name Type/Options Default Description
name str *-- name of your dataset
data_source DataSource.Azure
DataSource.AWS
*-- the datasource of your dataset
storage_url str *-- your cloud storage url
container_name str None azure container name
data_folder str *-- the relative data folder from the storage_url and container
sensors list[Sensor] *-- the list of Sensor of your dataset (one or more from project specified sensors)
type DatasetType.ANNOTATED_DATA
DatasetType.RAW_DATA
*-- your dataset type (annotated or raw data)
annotation_format AnnotationFormat.VISION_AI
AnnotationFormat.KITTI
AnnotationFormat.COCO
AnnotationFormat.YOLO
AnnotationFormat.IMAGE
*-- the format of your annotation data
annotations list[str] None list of names for your annotation data folders, such as ["groundtruth"]
sequential bool False data is sequential or not
render_pcd bool False render pcd preview image or not
generate_metadata bool False generate image meta data or not
auto_tagging list None generate auto_tagging with target models ["weather", "scene", "timeofday"]
description str None your dataset description
sas_token str None SAStoken for azure container
access_key_id str None access key id for AWS private s3 bucket
secret_access_key str None secret access key for AWS private s3 bucket

*--: required argument without default


Use create_dataset to import dataset from LOCAL

dataset_data2 = {
    "name": "dataset-local-upload",
    "data_source": DataSource.LOCAL,
    "storage_url": "",
    "container_name": "",
    "data_folder": "/YOUR/TARGET/LOCAL/FOLDER",
    "sensors": project.sensors,
    "type": DatasetType.ANNOTATED_DATA, # or DatasetType.RAW_DATA for images
    "annotation_format": AnnotationFormat.VISION_AI,
    "annotations": ["groundtruth"],
    "sequential": False,
    "generate_metadata": False,
    "auto_tagging": []
}
dataset2 = project.create_dataset(**dataset_data2)

Your could also use the script for importing dataset from local

python tools/import_dataset_from_local.py -host https://staging.visionai.linkervision.ai/dataverse/curation -e {your-account-email} -p {PASSWORD} -s {service-id}  -project {project-id} --folder {/YOUR/TARGET/LOCAL/FOLDER} -name {dataset-name} -type {raw_data OR annotated_data} -anno {image OR vision_ai} --sequential

Get Dataset

The get_dataset method retrieves the dataset info from the connected site. The dataset_id parameter is the unique integer ID of the dataset, not its "name" property.

dataset = client.get_dataset(dataset_id=5)

List Models

The list_models method will list all the models in the given project

#1
models = client.list_models(project_id = 1, client_alias=client.alias)
#2
project = client.get_project(project_id=1)
models = project.list_models()

Get Model

The get_model method will get the model detail info by the given model-id

model = client.get_model(model_id=30, client_alias=client.alias)
model = project.get_model(model_id=30)

From the given model, we could get the model convert records as below

model_record = client.get_convert_record(convert_record_id=1, client_alias=client.alias)
OR
model_record = model.get_convert_record(convert_record_id=1)

Create VQA Project

The create_vqa_project method will create project on the connected site with the defined questions/answer_type.

  • Example Usage:
# 1) Create question class with question and answer type pair
question_answer = [ QuestionClass(class_name="question1", rank=1, question="Is any person found in the picture?",
                    answer_type="boolean"),
                    QuestionClass(class_name="question2", rank=2, question="What is the blob color of traffic light?",   answer_type="option",answer_options=["red","yello","green"])
                   ]
# 2) Create your VQA project as below
project = client.create_vqa_project(name="vqa-project", sensor_name="camera1", ontology_name="vqa-ontology" question_answer=question_answer)
  • Input arguments for creating project:
Argument name Type/Options Default Description
name str *-- name of your project
sensor_name str *-- the camera sensor name
ontology_name str *-- the ontology name
question_answer list[QuestionClass] *-- your question/answer_type
description str None your project description

*--: required argument without default


Edit VQA Ontology

** Note:

  1. Can not edit question answer type
  2. Can not update with existing answer options
  3. Can not add question with existing rank id
create_questions = [QuestionClass(class_name="question3", rank=3, question="Age?",answer_type="number")]
update_questions = [{"rank": 2, "question": "What is the blob color of traffic light?(the closet one)", "options":["black"] }]

#should provide client_alias if calling from client
client.edit_vqa_ontology(project_id=24,  ontology_name="ontology-new-name",
                                         create=create_questions,
                                         update=update_questions,
                                         client_alias=client.alias)
#OR
project.edit_vqa_ontology(project_id=24, ontology_name="ontology-new-name",
                                         create=create_questions,
                                         update=update_questions)

Get Question List

The function below could help you get the question list of VQA project (which could help you to prepare the annotated data)

output = client.get_question_list(project_id=107, output_file_path="./question.json" )

Troubleshooting

Next steps

Contributing

Links to language repos

Python Readme

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataverse_sdk-1.6.0.tar.gz (29.0 kB view details)

Uploaded Source

Built Distribution

dataverse_sdk-1.6.0-py3-none-any.whl (27.5 kB view details)

Uploaded Python 3

File details

Details for the file dataverse_sdk-1.6.0.tar.gz.

File metadata

  • Download URL: dataverse_sdk-1.6.0.tar.gz
  • Upload date:
  • Size: 29.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for dataverse_sdk-1.6.0.tar.gz
Algorithm Hash digest
SHA256 5f4f1c3fb9b532c3e7a41057a547272f10eee1fdef5a1ce29213ab24ddb1e34f
MD5 76681e0bb5881d212bc175a195f470c9
BLAKE2b-256 3a19dca6d6cd6be636c1b2bf4c7cd443cdc9131d3d6f694d2c8d6664ce536f42

See more details on using hashes here.

File details

Details for the file dataverse_sdk-1.6.0-py3-none-any.whl.

File metadata

File hashes

Hashes for dataverse_sdk-1.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6b808ad1357f67dd1e81b7fd42f1d0c5d1c61dcd7c151f24dd681c70bf335a2f
MD5 e8c9d8fc7d2583fe717fa223182da01d
BLAKE2b-256 f4d133715ff1fa3d25928f399266a67e1f88eff63e8597e773ffcc3974011fbd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page