Skip to main content

Lightly Purple is a lightweight, fast, and easy-to-use data exploration tool for data scientists and engineers.

Project description

The open-source tool curating datasets


PyPI python PyPI version License

๐Ÿš€ Aloha!

We at Lightly created an open-source tool that supercharges your data curation workflows by enabling you to explore datasets, analyze data quality, and improve your machine learning pipelines more efficiently than ever before. Embark with us in this adventure of building better datasets. .

๐Ÿ’ป Installation

Please use Python 3.8 or higher with venv. Works on Windows, Linux, and macOS.

# Create virtual environment
# On Linux/macOS:
python3 -m venv venv
source venv/bin/activate

# On Windows:
python -m venv venv
.\venv\Scripts\activate

# Install library
pip install lightly-purple

Quickstart

Download the dataset and run a quickstart script to load your dataset and launch the app.

Here are few examples for you to try out:

YOLO8 dataset:

# Download and extract dataset
export DATASET_PATH=$(pwd)/example-dataset && \
    bash <(curl -sL https://raw.githubusercontent.com/lightly-ai/gists/refs/heads/main/fetch-dataset.sh) \
        https://universe.roboflow.com/ds/nToYP9Q1ix\?key\=pnjUGTjjba \
        $DATASET_PATH

# Download example script
curl -sL https://raw.githubusercontent.com/lightly-ai/gists/refs/heads/main/example-yolo8.py > example.py

# Run the example script
python example.py

The YOLO dataset should follow this structure:

dataset/
โ”œโ”€โ”€ train/
โ”‚   โ”œโ”€โ”€ images/
โ”‚   โ”‚   โ”œโ”€โ”€ image1.jpg
โ”‚   โ”‚   โ”œโ”€โ”€ image2.jpg
โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ””โ”€โ”€ labels/
โ”‚       โ”œโ”€โ”€ image1.txt
โ”‚       โ”œโ”€โ”€ image2.txt
โ”‚       โ””โ”€โ”€ ...
โ”œโ”€โ”€ valid/  (optional)
โ”‚   โ”œโ”€โ”€ images/
โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ””โ”€โ”€ labels/
โ”‚       โ””โ”€โ”€ ...
โ””โ”€โ”€ data.yaml

Each label file should contain YOLO format annotations (one per line):

<class> <x_center> <y_center> <width> <height>

Where coordinates are normalized between 0 and 1.

Let's break down what these commands do:

  1. Setting up the dataset path:

    export DATASET_PATH=$(pwd)/example-dataset
    

    This creates an environment variable DATASET_PATH pointing to an 'example-dataset' folder in your current directory.

  2. Downloading and extracting the dataset:

    bash <(curl -sL https://raw.githubusercontent.com/lightly-ai/gists/refs/heads/main/fetch-dataset.sh)
    
    • Downloads a shell script that handles dataset fetching
    • The script downloads a YOLO-format dataset from Roboflow
    • Automatically extracts the dataset to your specified DATASET_PATH
  3. Getting the example code:

    curl -sL https://raw.githubusercontent.com/lightly-ai/gists/refs/heads/main/example-yolo8.py > example.py
    

    Downloads a Python script that demonstrates how to:

    • Load the YOLO dataset
    • Process the images and annotations
    • Launch the Lightly Purple UI for exploration
  4. Running the example:

    python example.py
    

    Executes the downloaded script, which will:

    • Initialize the dataset processor
    • Load and analyze your data
    • Start a local server
    • Open the UI in your default web browser

๐Ÿ” How It Works

Lightly Purple helps you understand and curate your datasets through several key components:

Core Components

  • Dataset Processor: Prepares your data and annotations by:

    • Loading and preprocessing datasets
    • Handling various data formats and annotation types
    • Computing metadata
    • Performing quality analysis
  • Data Storage Layer: Manages persistent data storage:

    • Stores raw dataset files and annotations
    • Maintains computed metadata
    • Caches processed results for quick access
    • Provides efficient data retrieval interfaces
  • Backend API: Manages processed data and serves as the information hub:

    • Stores dataset metadata and analysis results
    • Handles data queries and filtering
    • Provides endpoints for dataset exploration
    • Manages user interactions with the data
  • Modern UI Application: A responsive web interface that:

    • Consumes local API endpoints
    • Visualizes your dataset and analysis results
    • Provides interactive exploration tools
    • Enables dataset curation workflows

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lightly_purple-0.2.12.dev0.tar.gz (98.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lightly_purple-0.2.12.dev0-py3-none-any.whl (129.0 kB view details)

Uploaded Python 3

File details

Details for the file lightly_purple-0.2.12.dev0.tar.gz.

File metadata

File hashes

Hashes for lightly_purple-0.2.12.dev0.tar.gz
Algorithm Hash digest
SHA256 8a37f0b420d84f7d6a6a3b67dff51268f00665002c5d8d798f14acc851378717
MD5 319ee184feb65b4b49eec68d4294ba5e
BLAKE2b-256 e412d67b64fce33014fcd8f9bbc373a5f2f3fa747533ab8bf57e31f4e9353ddc

See more details on using hashes here.

File details

Details for the file lightly_purple-0.2.12.dev0-py3-none-any.whl.

File metadata

File hashes

Hashes for lightly_purple-0.2.12.dev0-py3-none-any.whl
Algorithm Hash digest
SHA256 f467ab19462a03d9a5dd8ac70e8410079aefb011bbd7708714f2e43d9ef14999
MD5 cc1adb4a576a840936ae24be61d495bb
BLAKE2b-256 48fdbdea243c3ee646e5292e1149cc37fc583d05b175a8b52093f710cd4b1663

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page