Labelbox Connector for Databricks
Project description
The Official Labelbox <> Databricks Python Integration
Labelbox enables teams to maximize the value of their unstructured data with its enterprise-grade training data platform. For ML use cases, Labelbox has tools to deploy labelers to annotate data at massive scale, diagnose model performance to prioritize labeling, and plug in existing ML models to speed up labeling. For non-ML use cases, Labelbox has a powerful catalog with auto-computed similarity scores that users can leverage to label large amounts of data with a couple clicks.
This library was designed to run in a Databricks environment, although it will function in any Spark environment with some modification.
We strongly encourage collaboration - please free to fork this repo and tweak the code base to work for you own data, and make pull requests if you have suggestions on how to enhance the overall experience, add new features, or improve general performance.
Please report any issues/bugs via Github Issues.
Table of Contents
Requirements
- Databricks: Runtime 10.4 LTS or Later
- Apache Spark: 3.1.2 or Later
- Labelbox account
- Generate a Labelbox API key
Setup
Set up LabelSpark with the following lines of code:
%pip install labelspark -q
import labelspark as ls
api_key = "" # Insert your Labelbox API key here
client = ls.Client(api_key)
Once set up, you can run the following core functions:
-
client.create_data_rows_from_table()
: Creates Labelbox data rows (and metadata) given a Spark Table DataFrame -
client.export_to_table()
: Exports labels (and metadata) from a given Labelbox project and creates a Spark DataFrame
Example Notebooks
Importing Data
Notebook | Github |
---|---|
Basics: Data Rows from URLs | |
Data Rows with Metadata | |
Data Rows with Attachments | |
Data Rows with Annotations | |
Putting it all Together |
Exporting Data
Notebook | Github |
---|---|
Exporting Data to a Spark Table |
While using LabelSpark, you will likely also use the Labelbox SDK (e.g. for programmatic ontology creation). These resources will help familiarize you with the Labelbox Python SDK:
- Visit our docs to learn how the SDK works
- Checkout our notebook examples to follow along with interactive tutorials
- View the Labelbox API reference.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for labelspark-0.7.30-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eed24e5d532927ba044dd9c2418826c8cda27342bc70376080f9f35534d4ee88 |
|
MD5 | 2a0f2c8b14a6dc3ee9a381d8227ac263 |
|
BLAKE2b-256 | 5cbacdeff278fa17efd3c367936dba0b264ce808de8484df6867b2e0da050a0d |