Skip to main content

Labelbox Connector for Databricks

Project description

The Official Labelbox <> Databricks Python Integration

Labelbox enables teams to maximize the value of their unstructured data with its enterprise-grade training data platform. For ML use cases, Labelbox has tools to deploy labelers to annotate data at massive scale, diagnose model performance to prioritize labeling, and plug in existing ML models to speed up labeling. For non-ML use cases, Labelbox has a powerful catalog with auto-computed similarity scores that users can leverage to label large amounts of data with a couple clicks.

This library was designed to run in a Databricks environment, although it will function in any Spark environment with some modification.

We strongly encourage collaboration - please free to fork this repo and tweak the code base to work for you own data, and make pull requests if you have suggestions on how to enhance the overall experience, add new features, or improve general performance.

Please report any issues/bugs via Github Issues.

Table of Contents

Requirements

Setup

Set up LabelSpark with the following lines of code:

%pip install labelspark -q
import labelspark as ls

api_key = "" # Insert your Labelbox API key here
client = ls.Client(api_key)

Once set up, you can run the following core functions:

  • client.create_data_rows_from_table() : Creates Labelbox data rows (and metadata) given a Spark Table DataFrame

  • client.export_to_table() : Exports labels (and metadata) from a given Labelbox project and creates a Spark DataFrame

Example Notebooks

Importing Data

Notebook Github
Basics: Data Rows from URLs Github
Data Rows with Metadata Github
Data Rows with Attachments Github
Data Rows with Annotations Github
Putting it all Together Github

Exporting Data

Notebook Github
Exporting Data to a Spark Table Github

While using LabelSpark, you will likely also use the Labelbox SDK (e.g. for programmatic ontology creation). These resources will help familiarize you with the Labelbox Python SDK:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

labelspark-0.7.35.tar.gz (26.9 kB view hashes)

Uploaded Source

Built Distribution

labelspark-0.7.35-py3-none-any.whl (31.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page