Skip to main content

Labelbox Connector for BigQuery

Project description

Labelbox Connector for Google BigQuery

Access the Labelbox Connector for Google BigQuery to easily upload your CSV of text snippets to BigQuery, select columns, and add that dataset to Labelbox for annotation in our text tool. This is a very specific library for text use-cases, although it may be expanded to support other use-cases as needed in BigQuery.

The Demo code supplied in this Github is designed to run in a Google Co-Lab, but the code can be adapted to any notebook environment.

Labelbox is the enterprise-grade training data solution with fast AI enabled labeling tools, labeling automation, human workforce, data management, a powerful API for integration & SDK for extensibility. Visit Labelbox for more information.

This library is currently in beta. It may contain errors or inaccuracies and may not function as well as commercially released software. Please report any issues/bugs via Github Issues.

Table of Contents

Requirements

Installation

Install LabelBigQuery to your Python environment. The installation will also add the Labelbox SDK and BigQuery SDK.

pip install labelbigquery

Documentation

LabelBigQuery includes several methods to help facilitate your workflow between BigQuery and Labelbox.

  1. Add your CSV contents to BigQuery (only necessary if you don't have your data in BigQuery yet):
   #define headers and fields for BigQuery data load
    SELECTED_HEADERS = {
        'conversation_id',
        'normalized_query'
    }

    SCHEMA_FIELDS = [
        bigquery.SchemaField("conversation_id", "STRING"),
        bigquery.SchemaField("normalized_query", "STRING"),
    ]

    labelbigquery.load_data_to_big_query(bq_client, args.table_name, args.csv_file_name,
                                         SELECTED_HEADERS = SELECTED_HEADERS,SCHEMA_FIELDS = SCHEMA_FIELDS)

Where "SELECTED_HEADERS" and "SCHEMA_FIELDS" specifies the columns of your CSV that you want to send to BigQuery, along with the type definitions for proper storage in BigQuery.

Labelbigquery for text requires two columns of data; a unique identifier (becomes the "External ID" in our system), and a corresponding text string. Here is a chatbot example table:

conversation_id normalized_query
sample_1 Some text string here for labeling.
sample_2 Some text string here for labeling.
sample_3 Some text string here for labeling.
  1. Submit a query to BigQuery for your target columns. This will also write individual text files to a "data" folder. The file names are based off the unique identifier ("conversation id" in the above example).
    query = fr'SELECT conversation_id, STRING_AGG(normalized_query, "\n") FROM {args.table_name} GROUP BY conversation_id'
    file_names = labelbigquery.fetch_and_write_rows(bq_client, query=query)
  1. Submit your files to Labelbox for annotation in the text editor.
    lb_dataset = labelbigquery.make_dataset_and_data_rows(lb_client, file_names, args.dataset_name)
    print("Dataset unique identifier: " + lb_dataset.uid)

While using LabelBigQuery, you will likely also use the Labelbox SDK (e.g. for programmatic ontology creation). These resources will help familiarize you with the Labelbox Python SDK:

Authentication

Labelbox uses API keys to validate requests. You can create and manage API keys on Labelbox.

Contribution

Please consult CONTRIB.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

labelbigquery-0.1.0.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

labelbigquery-0.1.0-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file labelbigquery-0.1.0.tar.gz.

File metadata

  • Download URL: labelbigquery-0.1.0.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.4

File hashes

Hashes for labelbigquery-0.1.0.tar.gz
Algorithm Hash digest
SHA256 54b83b04ee753531739737a06b9e80009473559d7ec042365fbe8c681efa9b5e
MD5 de3481437fb4eaba179b2d0db945c983
BLAKE2b-256 90bed8b3d2b4f90fde917ba75686802df40d91cf25cfc5295ac7f3024805d08f

See more details on using hashes here.

File details

Details for the file labelbigquery-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for labelbigquery-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0e78b047f420ac7bb9c4b9e5d9e182d55017acf7634e25152691d27846b86805
MD5 b367df59fc8ef17e14f16b9219b41b89
BLAKE2b-256 a865436877434c958bffc75e9efcd5700de40a8639fb21928d2b898f7c34ed65

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page