SDK for connecting to matrice.ai services
Project description
Introduction
The matrice-datasets-sdk is a python package that has APIs defined in it to perform operations related to datasets on the matrice.ai platform.
Installation (will update it after we release this repo as a python package):
git clone https://github.com/matrice-ai/matrice-dataset-sdk.git
cd matrice-database-sdk
File structure
python_sdk/
├── docs/
│ ├── Be-Annotation.md
│ ├── Be-Dataset.md
│ ├── Be-Deployment.md
│ ├── Be-Model.md
│ ├── Be-Project.md
│ ├── Be-Resources.py
│ ├── Be-User.py
│ └── Inference-Optimization.md
├── src/
│ ├── annotation.py
│ ├── dataset.py
│ ├── deployment.py
│ ├── inference_optim.py
│ ├── models.py
│ ├── resources.py
│ ├── rpc.py
│ └── token_auth.py
├── test/
│ ├── annotation.ipynb
│ ├── config.py
│ ├── dataset.ipynb
│ ├── deployment.ipynb
│ ├── inference_optim.ipynb
│ ├── projects.ipynb
│ ├── models.ipynb
│ ├── resources.ipynb
│ └── user.ipynb
├── setup.py
└── README.md
Usage:
Class dataset.Dataset [Source]
All the operations on a dataset will be performed via the methods of the Dataset class. So, before performing any operation on the datasets, we need to instantiate an object of the Dataset class.
class dataset.Dataset(project_id, dataset_id=None, email="", password="")
This constructor should be used to instantiate an object of the Dataset class.
Parameters in the constructor:
project_id: string, the id of the project in which you want to create the datasetdataset_id: string, the id of the dataset on which you want to perform operationsemail: string, email associated with the matrice.ai platform accountpassword: string, password for your account corresponding toemailfor the matrice.ai platform
If you want to create a new dataset and perform some operations on it, you can instantiate an object of the Dataset class without setting the value of dataset_id.
# import Dataset class
from src.dataset import Dataset
# set the value of my_email to the email associated with the matrice.ai account
my_email = "example@gmail.com"
# set the value of my_password to the password corresponding to `my_email` for matrice.ai platform
my_password = "password"
# set the value of project_id to the id of the project in which you want to create the dataset
project_id = "abcc123facc"
# instantiate an object of the Dataset class
d1 = Dataset(project_id = project_id,email=email, password=password)
If you want to perform any operation on an already created dataset, you will have to set the value of dataset_id to the id of the dataset on which you want to perform operations while instantiating the object of the Datasets class.
# set the value of my_email to the email associated with the matrice.ai account
my_email = "example@gmail.com"
# set the value of my_password to the password corresponding to `my_email` for matrice.ai platform
my_password = "password"
# set the value of project_id to the id of the project in which the dataset you want to access exists
project_id = "abcc123facc"
# set the value of dataset_id to the id of the dataset on which you want to perform operations
dataset_id = "addfc123facc"
# instantiate an object of the Dataset class
d1 = Dataset(project_id = project_id, dataset_id = dataset_id ,email=email, password=password)
Note:
All the methods defined in Dataset will return response, error, and message. The response will be the response given by the matrice.ai backend, error will represent the error message(if there is any error else None) and message will be a simple text message that provides information about the status of the task being requested to execute.
Create a dataset:
To create a dataset in the matrice.ai platform using the matrice-datasets-sdk, we can use the create_dataset() method defined in the dataset.Dataset class. Before, calling the create_dataset method, we instantiate an object of the Dataset class.
dataset.Dataset.create_dataset(self, name, type, is_unlabeled, source, source_url, dataset_description="", version_description="")
Parameters in the method:
name: string, name of the dataset to be createdtype: string, type of the task of the dataset (for now we support"classification"and"detection"types)is_unlabeled: boolean (True/False), True if the dataset is unlabeledsource: string, Source for dataset files (currently supported only "url")source_url: string, URL of the dataset zipped filedataset_description: string, Optional field that is used to store some information about the dataset we are creatingversion_description: string, Optional field that is used to store some information about thev1.0of the dataset
In the matrice.ai platform, we can create a new dataset either by uploading local zip files or by using a URL of the dataset files stored somewhere remotely.
To create a dataset using the URL of the dataset files, we pass source="url" and pass URL of files as the source_url argument.
resp, err, msg = d1.create_dataset(name = "dataset-name",type="detection", is_unlabeled=False, source="url", source_url="https://example.com/sample2_data.tar.gz", dataset_description="Dataset created using matricesdk", version_description="Initial Version")
Add new samples to the existing dataset:
To add new samples to the existing dataset, we can use the add_new_samples() method defined in the dataset.Dataset class.
dataset.Dataset.add_new_samples(self,old_version, new_version, is_unlabeled, source, source_url, dataset_description="", version_description="")
Parameters in the method:
old_version: string, the version of the dataset to which you want to add samplesnew_version: string, the new version of the dataset with added samples (set new_version equal to old_version argument if you don't want to create a new version)is_unlabeled: boolean (True/False), True if the dataset is unlabeledsource: string, Source for dataset files (currently supported only "url")source_url: string, URL of the dataset zipped filedataset_description: string, Optional field if set to a string updates existing description of the datasetversion_description: string, Optional field. Used to set the description of a version if we are creating a new version or used to update the description of a version if we are updating an existing version
In the matrice.ai platform, we can add new samples to the existing dataset either by uploading local zip files or by using a URL of the dataset files stored somewhere remotely.
To add new samples to the existing dataset using the URL of the dataset files, we pass source="url" and pass URL of files as the source_url argument.
resp, err, msg = d1.add_new_samples(old_version="v1.0", new_version="v1.0", is_unlabeled=False, source="url", source_url="https://myexample.com/sample2_data.tar.gz")
The python statement above will add new samples to version v1.0 of the d1 dataset using the files in the source_url. It will not create a new version in this case.
resp, err, msg = d1.add_new_samples(old_version="v1.0", new_version="v1.1", is_unlabeled=False, source="url", source_url="https://s3.us-west-2.amazonaws.com/temp.matrice/sample2_data.tar.gz",version_description="Adding samples")
The python statement above will add new samples to version v1.0 of the d1 dataset using the files in the source_url and store the final result as version v1.1`.
Create dataset splits:
To move samples from one split to another split or create new random splits in the existing dataset, one can use the split_dataset() method defined in the dataset.Dataset class.
dataset.Dataset.split_dataset(self,old_version, new_version, is_random_split, train_num=0, val_num=0, test_num=0, unassigned_num=0.dataset_description="", version_description="",)
Parameters in the method:
old_version: string, the version of the dataset in which you want to move samples from one split to another split or create new random splitsnew_version: string, the new version of the dataset with added samples (set new_version equal to old_version argument if you don't want to create a new version)is_random_split: boolean (True/False), set to True if you want to create fresh new random splitstrain_num: int, Number of samples you want to be in the training setval_num: int, Number of samples you want to be in the validation settest_num: int, Number of samples you want to be in the test setunassigned_num: int, Number of samples you want to be in the unassigned setdataset_description: string, Optional field if set to a string updates existing description of the datasetversion_description: string, Optional field. Used to set the description of a version if we are creating a new version or used to update the description of a version if we are updating an existing version
Delete a specific version of a dataset:
To delete a specific version of a dataset, we can use the delete_version() method defined in the dataset.Dataset class.
delete_version(self, version):
Parameters in the method:
version: string, version of the dataset to be deleted
resp, err, msg = d1.delete_version("v1.2")
The Python statement above will delete version v1.2 of the d1 dataset.
Delete a dataset:
To delete a dataset, we can use the delete_dataset() method defined in the dataset.Dataset class.
delete_dataset(self)
resp, err, msg = d1.delete_dataset()
The Python statement above will delete the d1 dataset.
Rename a dataset:
To rename a dataset, one can use the rename_dataset() method defined in the dataset.Dataset class.
rename_dataset(self, updated_name)
Parameters in the method:
updated_name: string, the name you want the dataset to rename to
resp, err, message = d1.rename_dataset(updated_name="my_dataset")
The Python statement above will rename the d1 dataset to my_dataset.
Get a list of all the datasets:
To get a list of all the datasets inside a project, one can use the get_all_datasets() method defined in the dataset.Dataset class.
get_all_datasets(self)
resp, err, message = d1.get_all_datasets()
The Python statement above will list all the datasets inside the project_id of d1.
Get information(details) about a dataset:
To get information about a particular dataset inside a project, we can use the get_dataset_info() method defined in the dataset.Dataset class.
get_dataset_info(self)
resp, err, message = d1.get_dataset_info()
The Python statement above will provide us with the information about d1.
Get a summary of a particular version of a dataset:
To get a summary of a particular version of a dataset, one can use the get_version_summary() method defined in the dataset.Dataset class.
get_version_summary(self, version)
Parameters in the method:
version: string, version of the dataset whose summary you want to get
resp, err, msg = d1.get_version_summary("v1.0")
The Python statement above will provide us with the summary of version v1.0 of dataset d1.
List all versions of a dataset:
To list all the versions of a dataset, one can use the list_versions() method defined in the dataset.Dataset class.
list_versions(self)
resp, err, msg = d1.list_versions()
The Python statement above will provide us with the latest version and the list of all versions of the dataset d1.
List all the label categories of a dataset:
To list all the label categories of a dataset, one can use the list_categories() method defined in the dataset.Dataset class.
list_categories(self)
resp, err, msg = d1.list_categories()
The Python statement above will provide us with all the label categories of the dataset d1.
List all the logs for a dataset:
To list all the logs for a dataset, one can use the get_logs() method defined in the dataset.Dataset class.
get_logs(self)
resp, err, msg = d1.get_logs()
The Python statement above will give us a list of all the logs for the dataset d1.
List all the dataset items in a dataset:
To list all the dataset items in a dataset, one can use the list_all_dataset_items() method defined in the dataset.Dataset class.
list_all_dataset_items(self, version)
Parameters in the method:
version: string, version of the dataset whose items you want to get
resp, err, msg = d1.list_all_dataset_items(version="v1.0")
The Python statement above will give us a list of all the dataset items in version v1.0 of the dataset d1.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file matrice-1.0.99542.tar.gz.
File metadata
- Download URL: matrice-1.0.99542.tar.gz
- Upload date:
- Size: 2.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d5873ef23c540d28a037e5372d9662403b90e8f94e5514a33590fa8a81141aa
|
|
| MD5 |
f693e361b2393f924852cb526b0ed28d
|
|
| BLAKE2b-256 |
cd6b102d600a4c69acbec27f332405e702216a339f4940dd552bada864c815ca
|
File details
Details for the file matrice-1.0.99542-py3-none-any.whl.
File metadata
- Download URL: matrice-1.0.99542-py3-none-any.whl
- Upload date:
- Size: 2.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
69b3e6282d5d069c11fb7af39998759b14ba5e47419896f8f563b1c3765f8d41
|
|
| MD5 |
42181e1f53672a44304caedd5fbf5b2a
|
|
| BLAKE2b-256 |
c9d3eaf4695305b145b0f038498ed283defb9413943e7c7202f6985779052c8e
|