Skip to main content

This package is used to interface with Hirundo's platform. It provides a simple API to optimize your ML datasets.

Project description

Hirundo

This package exposes access to Hirundo APIs for dataset optimization for Machine Learning.

Dataset optimization is currently available for datasets labelled for classification and object detection.

Support dataset storage integrations include:

  • Google Cloud (GCP) Storage
  • Amazon Web Services (AWS) S3
  • Git LFS (Large File Storage) repositories (e.g. GitHub or HuggingFace)

Optimizing a classification dataset ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Currently hirundo requires a CSV file with the following columns (all columns are required):

  • image_path: The location of the image within the dataset root
  • label: The label of the image, i.e. which the class that was annotated for this image

And outputs a CSV with the same columns and:

  • suspect_level: mislabel suspect level
  • suggested_label: suggested label
  • suggested_label_conf: suggested label confidence

Optimizing an object detection (OD) dataset ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Currently hirundo requires a CSV file with the following columns (all columns are required):

  • image_path: The location of the image within the dataset root
  • bbox_id: The index of the bounding box within the dataset. Used to indicate label suspects
  • label: The label of the image, i.e. which the class that was annotated for this image
  • x1, y1, x2, y2: The bounding box coordinates of the object within the image

And outputs a CSV with the same columns and:

  • suspect_level: object mislabel suspect level
  • suggested_label: suggested object label
  • suggested_label_conf: suggested object label confidence

Note: This Python package must be used alongside a Hirundo server, either the SaaS platform, a custom VPC deployment or an on-premises installation.

Installation

You can install the codebase with a simple pip install hirundo to install the latest version of this package. If you prefer to install from the Git repository and/or need a specific version or branch, you can simply clone the repository, check out the relevant commit and then run pip install . to install that version. A full list of dependencies can be found in requirements.txt, but these will be installed automatically by either of these commands.

Usage

Classification example:

from hirundo.dataset_optimization import OptimizationDataset
from hirundo.enum import LabellingType
from hirundo.storage import StorageIntegration, StorageLink, StorageTypes

test_dataset = OptimizationDataset(
    name="TEST-GCP cifar 100 classification dataset",
    labelling_type=LabellingType.SingleLabelClassification,
    dataset_storage=StorageLink(
        storage_integration=StorageIntegration(
            name="cifar100bucket",
            type=StorageTypes.GCP,
            gcp=StorageGCP(
                bucket_name="cifar100bucket",
                project="Hirundo-global",
                credentials_json=json.loads(os.environ["GCP_CREDENTIALS"]),
            ),
        ),
        path="/pytorch-cifar/data",
    ),
    dataset_metadata_path="cifar100.csv",
    classes=cifar100_classes,
)

test_dataset.run_optimization()
results = test_dataset.check_run()
print(results)

Object detection example:

from hirundo.dataset_optimization import OptimizationDataset
from hirundo.enum import LabellingType
from hirundo.storage import StorageIntegration, StorageLink, StorageTypes

test_dataset = OptimizationDataset(
    name=f"TEST-HuggingFace-BDD-100k-validation-OD-validation-dataset{unique_id}",
    labelling_type=LabellingType.ObjectDetection,
    dataset_storage=StorageLink(
        storage_integration=StorageIntegration(
            name=f"BDD-100k-validation-dataset{unique_id}",
            type=StorageTypes.GIT,
            git=StorageGit(
                repo=GitRepo(
                    name=f"BDD-100k-validation-dataset{unique_id}",
                    repository_url="https://git@hf.co/datasets/hirundo-io/bdd100k-validation-only",
                ),
                branch="main",
            ),
        ),
        path="/BDD100K Val from Hirundo.zip/bdd100k",
    ),
    dataset_metadata_path="bdd100k.csv",
)

test_dataset.run_optimization()
results = test_dataset.check_run()
print(results)

Note: Currently we only support the main CPython release 3.9, 3.10 and 3.11. PyPy support may be introduced in the future.

Further documentation

To learn about mroe how to use this library, please visit the http://docs.hirundo.io/ or see the Google Colab examples.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hirundo-0.1.8.tar.gz (19.1 kB view details)

Uploaded Source

Built Distribution

hirundo-0.1.8-py3-none-any.whl (20.0 kB view details)

Uploaded Python 3

File details

Details for the file hirundo-0.1.8.tar.gz.

File metadata

  • Download URL: hirundo-0.1.8.tar.gz
  • Upload date:
  • Size: 19.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for hirundo-0.1.8.tar.gz
Algorithm Hash digest
SHA256 dcc828dbc18b327f557a4a9975d3016106ceeb48cd65eaae239040c6d1466f0e
MD5 2ac70a5b80fd75b60ea3954d104bbf87
BLAKE2b-256 09893b1d1a2b290400688a498753402fe8f80d0adf96b6c6e615542e3f04c47f

See more details on using hashes here.

File details

Details for the file hirundo-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: hirundo-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 20.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for hirundo-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 1e7b454a41d1888f23c08ae4f46684eda4180e1d255b6f5b6efa46f7220219ed
MD5 7e71afbe290fc6198353899452844664
BLAKE2b-256 fc23fbe6a19bf7d3ffc01e022480c395411732ee6c555dd064c0ed4783b7825f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page