Lightly Purple is a lightweight, fast, and easy-to-use data exploration tool for data scientists and engineers.
Project description
๐ Aloha!
We at Lightly created an open-source tool that supercharges your data curation workflows by enabling you to explore datasets, analyze data quality, and improve your machine learning pipelines more efficiently than ever before. Embark with us in this adventure of building better datasets. .
๐ป Installation
Please use Python 3.8 or higher with venv. Works on Windows, Linux, and macOS.
# Create virtual environment
# On Linux/macOS:
python3 -m venv venv
source venv/bin/activate
# On Windows:
python -m venv venv
.\venv\Scripts\activate
# Install library
pip install lightly-purple
Quickstart
Download the dataset and run a quickstart script to load your dataset and launch the app.
Here are few examples for you to try out:
YOLO8 dataset:
# Download and extract dataset
export DATASET_PATH=$(pwd)/example-dataset && \
bash <(curl -sL https://raw.githubusercontent.com/lightly-ai/gists/refs/heads/main/fetch-dataset.sh) \
https://universe.roboflow.com/ds/nToYP9Q1ix\?key\=pnjUGTjjba \
$DATASET_PATH
# Download example script
curl -sL https://raw.githubusercontent.com/lightly-ai/gists/refs/heads/main/example-yolo8.py > example.py
# Run the example script
python example.py
The YOLO dataset should follow this structure:
dataset/
โโโ train/
โ โโโ images/
โ โ โโโ image1.jpg
โ โ โโโ image2.jpg
โ โ โโโ ...
โ โโโ labels/
โ โโโ image1.txt
โ โโโ image2.txt
โ โโโ ...
โโโ valid/ (optional)
โ โโโ images/
โ โ โโโ ...
โ โโโ labels/
โ โโโ ...
โโโ data.yaml
Each label file should contain YOLO format annotations (one per line):
<class> <x_center> <y_center> <width> <height>
Where coordinates are normalized between 0 and 1.
Let's break down what these commands do:
-
Setting up the dataset path:
export DATASET_PATH=$(pwd)/example-dataset
This creates an environment variable
DATASET_PATHpointing to an 'example-dataset' folder in your current directory. -
Downloading and extracting the dataset:
bash <(curl -sL https://raw.githubusercontent.com/lightly-ai/gists/refs/heads/main/fetch-dataset.sh)
- Downloads a shell script that handles dataset fetching
- The script downloads a YOLO-format dataset from Roboflow
- Automatically extracts the dataset to your specified
DATASET_PATH
-
Getting the example code:
curl -sL https://raw.githubusercontent.com/lightly-ai/gists/refs/heads/main/example-yolo8.py > example.py
Downloads a Python script that demonstrates how to:
- Load the YOLO dataset
- Process the images and annotations
- Launch the Lightly Purple UI for exploration
-
Running the example:
python example.pyExecutes the downloaded script, which will:
- Initialize the dataset processor
- Load and analyze your data
- Start a local server
- Open the UI in your default web browser
๐ How It Works
Lightly Purple helps you understand and curate your datasets through several key components:
Core Components
-
Dataset Processor: Prepares your data and annotations by:
- Loading and preprocessing datasets
- Handling various data formats and annotation types
- Computing metadata
- Performing quality analysis
-
Data Storage Layer: Manages persistent data storage:
- Stores raw dataset files and annotations
- Maintains computed metadata
- Caches processed results for quick access
- Provides efficient data retrieval interfaces
-
Backend API: Manages processed data and serves as the information hub:
- Stores dataset metadata and analysis results
- Handles data queries and filtering
- Provides endpoints for dataset exploration
- Manages user interactions with the data
-
Modern UI Application: A responsive web interface that:
- Consumes local API endpoints
- Visualizes your dataset and analysis results
- Provides interactive exploration tools
- Enables dataset curation workflows
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lightly_purple-0.2.12.dev0.tar.gz.
File metadata
- Download URL: lightly_purple-0.2.12.dev0.tar.gz
- Upload date:
- Size: 98.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a37f0b420d84f7d6a6a3b67dff51268f00665002c5d8d798f14acc851378717
|
|
| MD5 |
319ee184feb65b4b49eec68d4294ba5e
|
|
| BLAKE2b-256 |
e412d67b64fce33014fcd8f9bbc373a5f2f3fa747533ab8bf57e31f4e9353ddc
|
File details
Details for the file lightly_purple-0.2.12.dev0-py3-none-any.whl.
File metadata
- Download URL: lightly_purple-0.2.12.dev0-py3-none-any.whl
- Upload date:
- Size: 129.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f467ab19462a03d9a5dd8ac70e8410079aefb011bbd7708714f2e43d9ef14999
|
|
| MD5 |
cc1adb4a576a840936ae24be61d495bb
|
|
| BLAKE2b-256 |
48fdbdea243c3ee646e5292e1149cc37fc583d05b175a8b52093f710cd4b1663
|