Skip to main content

Labelbox Connector for Snowflake

Project description

[!WARNING] Starting in July 2024, we will begin achieving all data connector libraries were they will no longer be maintained, including labelspark, labelpandas, labelsnow, and labelbox-bigquery libraries. To import data from remote sources such as Databricks and Snowflake, set up Census integrations directly on the Labelbox platform.

Labelbox Connector for Snowflake

Access the Labelbox Connector for Snowflake to connect an unstructured dataset to Labelbox, programmatically set up an ontology for labeling, and load the labeled dataset into your Snowflake environment.

Labelbox is the enterprise-grade training data solution with fast AI enabled labeling tools, labeling automation, human workforce, data management, a powerful API for integration & SDK for extensibility. Visit Labelbox for more information.

This library is currently in beta. It may contain errors or inaccuracies and may not function as well as commercially released software. Please report any issues/bugs via Github Issues.

Table of Contents

Requirements

Installation

Install LabelSnow to your Python environment. The installation will also add the Labelbox SDK, a requirement for LabelSnow to function. LabelSnow is available via pypi:

pip install labelsnow

Documentation

LabelSnow includes several methods to help facilitate your workflow between Snowflake and Labelbox.

  1. Create your dataset in Labelbox from your Unstructured Data stage in Snowflake:
sf_dataframe = labelsnow.get_snowflake_datarows(snowflake_cursor, "name_of_snowflake_stage", 604800) #604800 is signed_URL expiration time in Snowflake

my_demo_dataset = labelsnow.create_dataset(labelbox_client=lb_client, snowflake_pandas_dataframe=sf_dataframe, dataset_name="SF Test")

Where "sf_dataframe" is a pandas dataframe of unstructured data with asset names and asset URLs in two columns, named "external_id" and "row_data" respectively. my_demo_dataset labelsnow.create_dataset() returns a Labelbox Dataset python object.

external_id row_data
image1.jpg https://url_to_your_asset/image1.jpg
image2.jpg https://url_to_your_asset/image2.jpg
image3.jpg https://url_to_your_asset/image3.jpg
  1. Get your annotations from Labelbox as a Pandas DataFrame.
bronze_df = labelsnow.get_annotations(lb_client, "insert_project_id_here")
  1. You can use the our flattener to flatten the "Label" JSON column into component columns, or use the silver table method to produce a more queryable table of your labeled assets. Both of these methods take in the bronze table of annotations from above:
flattened_table = labelsnow.flatten_bronze_table(bronze_df)
queryable_silver_DF =labelsnow.silver_table(bronze_df)

Depositing your tables into Snowflake

We also include a helper function put_tables_into_snowflake that can help you quickly load Pandas tables into Snowflake. It takes in a dictionary of Pandas tables, creates tables, and loads the data.

my_table_payload = {"BRONZE_TABLE": bronze_df,
                    "FLATTENED_BRONZE_TABLE": flattened_table,
                    "SILVER_TABLE": silver_table}
                    
ctx = snowflake.connector.connect(
        user=credentials.user,
        password=credentials.password,
        account=credentials.account,
        warehouse="name_of_warehouse",
        database="SAMPLE_DB",
        schema="PUBLIC"
    )

labelsnow.put_tables_into_snowflake(ctx, my_table_payload)

How To Get Video Project Annotations

Because Labelbox Video projects can contain multiple videos, you must use the get_videoframe_annotations method to return an array of Pandas DataFrames for each video in your project. Each DataFrame contains frame-by-frame annotation for a video in the project:

video_bronze = labelsnow.get_annotations(lb_client, "insert_video_project_id_here") #sample completed video project
video_dataframe_framesets = labelsnow.get_videoframe_annotations(video_bronze, LB_API_KEY)

You may use standard Python code to iteratively to create your flattened bronze tables and silver tables:

silver_video_dataframes = {} 

video_count = 1
for frameset in video_dataframe_framesets:
    silver_table = labelsnow.silver_table(frameset)
    silver_table_with_datarowid = pd.merge(silver_table, video_bronze, how = 'inner', on=["DataRow ID"])
    video_name = "VIDEO_DEMO_{}".format(video_count)
    silver_video_dataframes[video_name] = silver_table_with_datarowid
    video_count += 1

Then deposit these Pandas dataframes into Snowflake with put_tables_into_snowflake

While using LabelSnow, you will likely also use the Labelbox SDK (e.g. for programmatic ontology creation). These resources will help familiarize you with the Labelbox Python SDK:

Authentication

Labelbox uses API keys to validate requests. You can create and manage API keys on Labelbox.

Contribution

Please consult CONTRIB.md

Provenance

SLSA 3

To enhance the software supply chain security of Labelbox's users, as of 0.1.3, every release contains a SLSA Level 3 Provenance document.
This document provides detailed information about the build process, including the repository and branch from which the package was generated.

By using the SLSA framework's official verifier, you can verify the provenance document to ensure that the package is from a trusted source. Verifying the provenance helps confirm that the package has not been tampered with and was built in a secure environment.

Example of usage for the 1.0.0 release wheel:

export VERSION=1.0.0
pip download --no-deps labelsnow==${VERSION}

curl --location -O \
  https://github.com/Labelbox/labelsnow/releases/download/${VERSION}/multiple.intoto.jsonl

slsa-verifier verify-artifact --source-branch main --builder-id 'https://github.com/slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@refs/tags/v2.0.0' --source-uri "git+https://github.com/Labelbox/labelsnow" --provenance-path multiple.intoto.jsonl ./labelsnow-${VERSION}-py3-none-any.whl

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

labelsnow-1.1.0.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

labelsnow-1.1.0-py3-none-any.whl (15.3 kB view details)

Uploaded Python 3

File details

Details for the file labelsnow-1.1.0.tar.gz.

File metadata

  • Download URL: labelsnow-1.1.0.tar.gz
  • Upload date:
  • Size: 15.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for labelsnow-1.1.0.tar.gz
Algorithm Hash digest
SHA256 cc648bbcba4ea6bba7936a9042128d6248ec921b0f858efc151282ebacb00576
MD5 5b2bfd8f89677e41f4a062309e3d49ff
BLAKE2b-256 f94c3d1271a81339de484238c53bc6e63f46cb0435ab91543ff5afebf6776ca4

See more details on using hashes here.

File details

Details for the file labelsnow-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: labelsnow-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for labelsnow-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dc4de75abd5bce8d8e042a5edf0695de95696678f0b1c277724f4d9fc3018288
MD5 7b0255b92539182040cd10322e4ee40e
BLAKE2b-256 b6a0a44f07e10f9da8553f4d5cc9ffda296d43c763df923842a5d9a97ef98e46

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page