This package is used to interface with Hirundo's platform. It provides a simple API to optimize your ML datasets.
Project description
Hirundo
This package exposes access to Hirundo APIs for dataset optimization for Machine Learning.
Dataset optimization is currently available for datasets labelled for classification and object detection.
Support dataset storage integrations include:
- Google Cloud (GCP) Storage
- Amazon Web Services (AWS) S3
- Git LFS (Large File Storage) repositories (e.g. GitHub or HuggingFace)
Optimizing a classification dataset ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Currently hirundo
requires a CSV file with the following columns (all columns are required):
image_path
: The location of the image within the datasetroot
label
: The label of the image, i.e. which the class that was annotated for this image
And outputs a CSV with the same columns and:
suspect_level
: mislabel suspect levelsuggested_label
: suggested labelsuggested_label_conf
: suggested label confidence
Optimizing an object detection (OD) dataset ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Currently hirundo
requires a CSV file with the following columns (all columns are required):
image_path
: The location of the image within the datasetroot
bbox_id
: The index of the bounding box within the dataset. Used to indicate label suspectslabel
: The label of the image, i.e. which the class that was annotated for this imagex1
,y1
,x2
,y2
: The bounding box coordinates of the object within the image
And outputs a CSV with the same columns and:
suspect_level
: object mislabel suspect levelsuggested_label
: suggested object labelsuggested_label_conf
: suggested object label confidence
Note: This Python package must be used alongside a Hirundo server, either the SaaS platform, a custom VPC deployment or an on-premises installation.
Installation
You can install the codebase with a simple pip install hirundo
to install the latest version of this package. If you prefer to install from the Git repository and/or need a specific version or branch, you can simply clone the repository, check out the relevant commit and then run pip install .
to install that version. A full list of dependencies can be found in requirements.txt
, but these will be installed automatically by either of these commands.
Usage
Classification example:
from hirundo.dataset_optimization import OptimizationDataset
from hirundo.enum import LabellingType
from hirundo.storage import StorageIntegration, StorageLink, StorageTypes
test_dataset = OptimizationDataset(
name="TEST-GCP cifar 100 classification dataset",
labelling_type=LabellingType.SingleLabelClassification,
dataset_storage=StorageLink(
storage_integration=StorageIntegration(
name="cifar100bucket",
type=StorageTypes.GCP,
gcp=StorageGCP(
bucket_name="cifar100bucket",
project="Hirundo-global",
credentials_json=json.loads(os.environ["GCP_CREDENTIALS"]),
),
),
path="/pytorch-cifar/data",
),
dataset_metadata_path="cifar100.csv",
classes=cifar100_classes,
)
test_dataset.run_optimization()
results = test_dataset.check_run()
print(results)
Object detection example:
from hirundo.dataset_optimization import OptimizationDataset
from hirundo.enum import LabellingType
from hirundo.storage import StorageIntegration, StorageLink, StorageTypes
test_dataset = OptimizationDataset(
name=f"TEST-HuggingFace-BDD-100k-validation-OD-validation-dataset{unique_id}",
labelling_type=LabellingType.ObjectDetection,
dataset_storage=StorageLink(
storage_integration=StorageIntegration(
name=f"BDD-100k-validation-dataset{unique_id}",
type=StorageTypes.GIT,
git=StorageGit(
repo=GitRepo(
name=f"BDD-100k-validation-dataset{unique_id}",
repository_url="https://git@hf.co/datasets/hirundo-io/bdd100k-validation-only",
),
branch="main",
),
),
path="/BDD100K Val from Hirundo.zip/bdd100k",
),
dataset_metadata_path="bdd100k.csv",
)
test_dataset.run_optimization()
results = test_dataset.check_run()
print(results)
Note: Currently we only support the main CPython release 3.9, 3.10 and 3.11. PyPy support may be introduced in the future.
Further documentation
To learn about mroe how to use this library, please visit the http://docs.hirundo.io/ or see the Google Colab examples.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file hirundo-0.1.8.tar.gz
.
File metadata
- Download URL: hirundo-0.1.8.tar.gz
- Upload date:
- Size: 19.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dcc828dbc18b327f557a4a9975d3016106ceeb48cd65eaae239040c6d1466f0e |
|
MD5 | 2ac70a5b80fd75b60ea3954d104bbf87 |
|
BLAKE2b-256 | 09893b1d1a2b290400688a498753402fe8f80d0adf96b6c6e615542e3f04c47f |
File details
Details for the file hirundo-0.1.8-py3-none-any.whl
.
File metadata
- Download URL: hirundo-0.1.8-py3-none-any.whl
- Upload date:
- Size: 20.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1e7b454a41d1888f23c08ae4f46684eda4180e1d255b6f5b6efa46f7220219ed |
|
MD5 | 7e71afbe290fc6198353899452844664 |
|
BLAKE2b-256 | fc23fbe6a19bf7d3ffc01e022480c395411732ee6c555dd064c0ed4783b7825f |