Labelbox Connector for BigQuery
Project description
Labelbox Connector for Google BigQuery
Access the Labelbox Connector for Google BigQuery to easily upload your CSV of text snippets to BigQuery, select columns, and add that dataset to Labelbox for annotation in our text tool. This is a very specific library for text use-cases, although it may be expanded to support other use-cases as needed in BigQuery.
The Demo code supplied in this Github is designed to run in a Google Co-Lab, but the code can be adapted to any notebook environment.
Labelbox is the enterprise-grade training data solution with fast AI enabled labeling tools, labeling automation, human workforce, data management, a powerful API for integration & SDK for extensibility. Visit Labelbox for more information.
This library is currently in beta. It may contain errors or inaccuracies and may not function as well as commercially released software. Please report any issues/bugs via Github Issues.
Table of Contents
Requirements
- Google Cloud BigQuery Authenticated Client
- Google BigQuery SDK
- Labelbox account
- Generate a Labelbox API key
Installation
Install LabelBigQuery to your Python environment. The installation will also add the Labelbox SDK and BigQuery SDK.
pip install labelbigquery
Documentation
LabelBigQuery includes several methods to help facilitate your workflow between BigQuery and Labelbox.
- Add your CSV contents to BigQuery (only necessary if you don't have your data in BigQuery yet):
#define headers and fields for BigQuery data load
SELECTED_HEADERS = {
'conversation_id',
'normalized_query'
}
SCHEMA_FIELDS = [
bigquery.SchemaField("conversation_id", "STRING"),
bigquery.SchemaField("normalized_query", "STRING"),
]
labelbigquery.load_data_to_big_query(bq_client, args.table_name, args.csv_file_name,
SELECTED_HEADERS = SELECTED_HEADERS,SCHEMA_FIELDS = SCHEMA_FIELDS)
Where "SELECTED_HEADERS" and "SCHEMA_FIELDS" specifies the columns of your CSV that you want to send to BigQuery, along with the type definitions for proper storage in BigQuery.
Labelbigquery for text requires two columns of data; a unique identifier (becomes the "External ID" in our system), and a corresponding text string. Here is a chatbot example table:
conversation_id | normalized_query |
---|---|
sample_1 | Some text string here for labeling. |
sample_2 | Some text string here for labeling. |
sample_3 | Some text string here for labeling. |
- Submit a query to BigQuery for your target columns. This will also write individual text files to a "data" folder. The file names are based off the unique identifier ("conversation id" in the above example).
query = fr'SELECT conversation_id, STRING_AGG(normalized_query, "\n") FROM {args.table_name} GROUP BY conversation_id'
file_names = labelbigquery.fetch_and_write_rows(bq_client, query=query)
- Submit your files to Labelbox for annotation in the text editor.
lb_dataset = labelbigquery.make_dataset_and_data_rows(lb_client, file_names, args.dataset_name)
print("Dataset unique identifier: " + lb_dataset.uid)
While using LabelBigQuery, you will likely also use the Labelbox SDK (e.g. for programmatic ontology creation). These resources will help familiarize you with the Labelbox Python SDK:
- Visit our docs to learn how the SDK works
- View our LabelBigQuery demo code for inspiration.
- view our API reference.
Authentication
Labelbox uses API keys to validate requests. You can create and manage API keys on Labelbox.
Contribution
Please consult CONTRIB.md
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file labelbigquery-0.1.0.tar.gz
.
File metadata
- Download URL: labelbigquery-0.1.0.tar.gz
- Upload date:
- Size: 7.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54b83b04ee753531739737a06b9e80009473559d7ec042365fbe8c681efa9b5e |
|
MD5 | de3481437fb4eaba179b2d0db945c983 |
|
BLAKE2b-256 | 90bed8b3d2b4f90fde917ba75686802df40d91cf25cfc5295ac7f3024805d08f |
File details
Details for the file labelbigquery-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: labelbigquery-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0e78b047f420ac7bb9c4b9e5d9e182d55017acf7634e25152691d27846b86805 |
|
MD5 | b367df59fc8ef17e14f16b9219b41b89 |
|
BLAKE2b-256 | a865436877434c958bffc75e9efcd5700de40a8639fb21928d2b898f7c34ed65 |