Dataverse SDK For Python
Project description
Dataverse SDK For Python
Dataverse is a MLOPs platform for assisting in data selection, data visualization and model training in computer vision. Use Dataverse-SDK for Python to help you to interact with the Dataverse platform by Python. Currently, the library supports:
- Create Project with your input ontology and sensors
- Get Project by project-id
- Create Dataset from your AWS/Azure storage or local
- Get Dataset by dataset-id
- List models for your selected project-id
- Get and download your model
Getting started
Install the package
pip install dataverse-sdk
Prerequisites: You must have an Dataverse Platform Account and Python 3.10+ to use this package.
Create the client
Interaction with the Dataverse site starts with an instance of the DataverseClient
class. You need site url, an email-account and its password to instantiate the client object.
from dataverse_sdk import *
from dataverse_sdk.connections import get_connection
client = DataverseClient(
host=DataverseHost.PRODUCTION, email="XXX", password="***", service_id="xxxx-xxxx-xx-xxx", alias="default", force = False
)
assert client is get_connection("default")
# Should provide different alias if you are trying to connect to different workspaces
client2 = DataverseClient(
host=DataverseHost.PRODUCTION, email="account-2", password="***", service_id="xxxx-xxxx-xx-xxx", alias="client2", force = False
)
assert client2 is get_connection(client2.alias)
client3 = DataverseClient(
host=DataverseHost.PRODUCTION, email="XXX", password="", service_id="xxxx-xxxx-xx-xxx", access_token="xxx"
)
assert client3 is get_connection(client3.alias)
- Input arguments:
Argument name | Type/Options | Default | Description |
---|---|---|---|
host | str | *-- | the host url of the dataverse site (with curation port) |
str | *-- | the email account of your dataverse workspace | |
password | str | *-- | the password of your dataverse workspace |
service_id | str | *-- | The service id of the dataverse you want to connect |
alias | str | 'default' | the connection alias of your dataverse client |
force | bool | False | whether force to replace the connection if the given alias exists |
access_token | str | None | instead of password to do authentication |
Key concepts
Once you've initialized a DataverseClient, you can interact with Dataverse from the initialized object.
Examples
The following sections provide examples for the most common DataVerse tasks including:
- Get User
- List Projects
- Create Project
- Get Project
- Edit Project
- Update Alias
- Create Dataset
- Get Dataset
- List Models
- Get and Download Model
Get User
The get_user
method is to list the current user info.
You can get the detail info, such as role, permission and user detail.
user = client.get_user()
List Projects
The list_projects
method will list all projects of the given sites.
- Example Usage:
projects = client.list_projects(current_user = True,
exclude_sensor_type=SensorType.LIDAR,
image_type= OntologyImageType._2D_BOUNDING_BOX)
- Input arguments:
Argument name | Type/Options | Default | Description |
---|---|---|---|
current_user | bool | True | only show the projects of current user |
exclude_sensor_type | SensorType.CAMERA SensorType.LIDAR |
None | exclude the projects with the given sensor type |
image_type | OntologyImageType._2D_BOUNDING_BOX OntologyImageType.SEMANTIC_SEGMENTATION OntologyImageType.CLASSIFICATION OntologyImageType.POINT OntologyImageType.POLYGON OntologyImageType.POLYLINE |
None | only include the projects with the given image type |
Create Project
The create_project
method will create project on the connected site with the defined ontology and sensors.
- Example Usage:
# 1) Create ontology with ontologyclass object
ontology = Ontology(
name="sample ontology",
image_type=OntologyImageType._2D_BOUNDING_BOX,
pcd_type = None,
classes=[
OntologyClass(name="Pedestrian", rank=1, color="#234567"),
OntologyClass(name="Truck", rank=2, color="#345678"),
OntologyClass(name="Car", rank=3, color="#456789"),
OntologyClass(name="Cyclist", rank=4, color="#567890"),
OntologyClass(name="DontCare", rank=5, color="#6789AB"),
OntologyClass(name="Misc", rank=6, color="#789AB1"),
OntologyClass(name="Van", rank=7, color="#89AB12"),
OntologyClass(name="Tram", rank=8, color="#9AB123"),
OntologyClass(name="Person_sitting", rank=9, color="#AB1234"),
],
)
For project with camera sensor, there would be only one image_type for one project. You could choose from [OntologyImageType._2D_BOUNDING_BOX, OntologyImageType.SEMANTIC_SEGMENTATION, OntologyImageType.CLASSIFICATION, OntologyImageType.POINT, OntologyImageType.POLYGON, OntologyImageType.POLYLINE]
.
For project with lidar sensor, your should assign pcd_type = OntologyPcdType.CUBOID
for the ontology.
# 2) Create your sensor list with name / SensorType
sensors = [
Sensor(name="camera1", type=SensorType.CAMERA),
Sensor(name="lidar1", type=SensorType.LIDAR),
]
# 3) Create your project tag attributes (Optional)
project_tag = ProjectTag(
attributes=[
{"name": "year", "type": "number"},
{
"name": "unknown_object",
"type": "option",
"options": [{"value": "fire"}, {"value": "leaves"}, {"value": "water"}],
},
]
)
# 4) Create your project with your ontology/sensors/project_tag
project = client.create_project(name="Sample project", ontology=ontology, sensors=sensors, project_tag=project_tag)
- Input arguments for creating project:
Argument name | Type/Options | Default | Description |
---|---|---|---|
name | str | *-- | name of your project |
ontology | Ontology | *-- | the Ontology basemodel data of current project |
sensors | list[Sensor] | *-- | the list of Sensor basemodel data of your project |
project_tag | ProjectTag | None | your project tags |
description | str | None | your project description |
*--
: required argument without default
- Check https://linkervision.gitbook.io/dataverse/data-management/project-ontology for the detail of
Project Ontology
Get Project
The get_proejct
method retrieves the project from the connected site. The project_id
parameter is the unique integer ID of the project, not its "name" property.
project = client.get_project(project_id= 1, client_alias=client.alias) # if client_alias is not provided, we'll get it from client
Edit Project
For editing project contents, we have four functions below for add/edit project tag and ontology classes.
Add New Project Tags
- Note: Can not create existing project tag!
tag = {
"attributes": [
{
"name": "month",
"type": "number"
},
{
"name": "weather",
"type": "option",
"options": [{"value":"sunny"}, {"value":"rainy"}, {"value":"cloudy"}
]
}]}
project_tag= ProjectTag(**tag)
#should provided client_alias if calling from client
client.add_project_tag(project_id = 10, project_tag=project_tag, client_alias=client.alias)
#OR
project.add_project_tag(project_tag=project_tag)
Edit Project Tags
** Note:
- Can not edit project tag that does not exist
- Can not modify the data type of existing project tags
- Can not provide attributes with existing options
tag = {
"attributes": [
{
"name": "weather",
"type": "option",
"options": [{"value":"unknown"}, {"value":"snowy"}
]
}]}
project_tag= ProjectTag(**tag)
#should provided client_alias if calling from client
client.edit_project_tag(project_id = 10, project_tag=project_tag, client_alias=client.alias)
#OR
project.edit_project_tag(project_tag=project_tag)
Add New Ontology Classes
- Note: Can not add existing ontology class!
new_classes = [OntologyClass(name="obstruction",
rank=9,
color="#AB4321",
attributes=[{
"name":
"status",
"type":
"option",
"options": [{
"value": "static"}, {"value": "moving"
}]}])]
#should provided client_alias if calling from client
client.add_ontology_classes(project_id=24, ontology_classes=new_classes, client_alias=client.alias)
#OR
project.add_ontology_classes(ontology_classes=new_classes)
Edit Ontology Classes
** Note:
- Can not edit ontology class that does not exist
- Can not modify the data type of existing ontology class attributes
- Can not provide attributes with existing options
edit_classes = [OntologyClass(name="obstruction",
color="#AB4321",
attributes=[{
"name":
"status",
"type":
"option",
"options": [{
"value": "unknown"}]}])]
#should provided client_alias if calling from client
client.edit_ontology_classes(project_id=24, ontology_classes=edit_classes, client_alias=client.alias)
#OR
project.edit_ontology_classes(ontology_classes=edit_classes)
Update Ontology Alias
- Get the csv file of alias map for your project
client.generate_alias_map(project_id=123, alias_file_path="./alias.csv")
-
Fill the alias in the csv file and save (DO NOT modify other fields)
-
Update alias for your project with the alias file path
client.update_alias(project_id=123, alias_file_path= "/Users/Downloads/alias.csv" )
Create Dataset
Use create_dataset
to import dataset from cloud storage
dataset_data = {
"name": "Dataset 1",
"data_source": DataSource.Azure/DataSource.AWS,
"storage_url": "storage/url",
"container_name": "azure container name",
"data_folder": "datafolder/to/vai_anno",
"sensors": project.sensors,
"type": DatasetType.ANNOTATED_DATA,
"annotation_format": AnnotationFormat.VISION_AI,
"annotations": ["groundtruth"],
"sequential": False,
"render_pcd": False,
"generate_metadata": False,
"auto_tagging": ["timeofday"],
"sas_token": "azure sas token", # only for azure storage
"access_key_id" : "aws s3 access key id",# only for private s3 bucket, don't need to assign it in case of public s3 bucket or azure data source
"secret_access_key": "aws s3 secret access key"# only for private s3 bucket, don't need to assign it in case of public s3 bucket or azure data source
}
dataset = project.create_dataset(**dataset_data)
- Input arguments for creating dataset from
cloud storage
:
Argument name | Type/Options | Default | Description |
---|---|---|---|
name | str | *-- | name of your dataset |
data_source | DataSource.Azure DataSource.AWS |
*-- | the datasource of your dataset |
storage_url | str | *-- | your cloud storage url |
container_name | str | None | azure container name |
data_folder | str | *-- | the relative data folder from the storage_url and container |
sensors | list[Sensor] | *-- | the list of Sensor of your dataset (one or more from project specified sensors) |
type | DatasetType.ANNOTATED_DATA DatasetType.RAW_DATA |
*-- | your dataset type (annotated or raw data) |
annotation_format | AnnotationFormat.VISION_AI AnnotationFormat.KITTI AnnotationFormat.COCO AnnotationFormat.IMAGE |
*-- | the format of your annotation data |
annotations | list[str] | None | list of names for your annotation data folders, such as ["groundtruth"] |
sequential | bool | False | data is sequential or not |
render_pcd | bool | False | render pcd preview image or not |
generate_metadata | bool | False | generate image meta data or not |
auto_tagging | list | None | generate auto_tagging with target models ["weather", "scene", "timeofday"] |
description | str | None | your dataset description |
sas_token | str | None | SAStoken for azure container |
access_key_id | str | None | access key id for AWS private s3 bucket |
secret_access_key | str | None | secret access key for AWS private s3 bucket |
*--
: required argument without default
- Check https://linkervision.gitbook.io/dataverse/data-management/import-dataset for the detail of
Import Dataset
.
Use create_dataset
to import dataset from LOCAL
dataset_data2 = {
"name": "dataset-local-upload",
"data_source": DataSource.LOCAL,
"storage_url": "",
"container_name": "",
"data_folder": "/YOUR/TARGET/LOCAL/FOLDER",
"sensors": project.sensors,
"type": DatasetType.ANNOTATED_DATA, # or DatasetType.RAW_DATA for images
"annotation_format": AnnotationFormat.VISION_AI,
"annotations": ["groundtruth"],
"sequential": False,
"generate_metadata": False,
"auto_tagging": []
}
dataset2 = project.create_dataset(**dataset_data2)
Your could also use the script for importing dataset from local
python tools/import_dataset_from_local.py -host https://staging.visionai.linkervision.ai/dataverse/curation -e {your-account-email} -p {PASSWORD} -s {service-id} -project {project-id} --folder {/YOUR/TARGET/LOCAL/FOLDER} -name {dataset-name} -type {raw_data OR annotated_data} -anno {image OR vision_ai} --sequential
Get Dataset
The get_dataset
method retrieves the dataset info from the connected site. The dataset_id
parameter is the unique integer ID of the dataset, not its "name" property.
dataset = client.get_dataset(dataset_id=5)
List Models
The list_models
method will list all the models in the given project
#1
models = client.list_models(project_id = 1, client_alias=client.alias)
#2
project = client.get_project(project_id=1)
models = project.list_models()
Get Model
The get_model
method will get the model detail info by the given model-id
model = client.get_model(model_id=30, client_alias=client.alias)
model = project.get_model(model_id=30)
From the given model, we could get the model convert records as below
model_record = client.get_convert_record(convert_record_id=1, client_alias=client.alias)
OR
model_record = model.get_convert_record(convert_record_id=1)
Troubleshooting
Next steps
Contributing
Links to language repos
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for dataverse_sdk-1.5.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4708fb2169ca002957cd241e6e7731ace4655a62843b63cb897e83be95475996 |
|
MD5 | 5ff88a79d361662cc12efe2faa718b62 |
|
BLAKE2b-256 | 3fe38c8cea456c43a48becb98e31ee9f382b9288f031d44d936a98a516b19c4a |