Skip to main content

A Python Dash application for interactive exploration and visualization of embedding vectors through dimensionality reduction techniques.

Project description

EmbeddingBuddy

A modular Python Dash web application for interactive exploration and visualization of embedding vectors through dimensionality reduction techniques. Compare documents and prompts in the same embedding space to understand semantic relationships.

Screenshot of 3d graph and UI for Embedding Buddy

Overview

EmbeddingBuddy provides an intuitive web interface for analyzing high-dimensional embedding vectors by applying various dimensionality reduction algorithms and visualizing the results in interactive 2D and 3D plots. The application features a clean, modular architecture that makes it easy to test, maintain, and extend with new features. It supports dual dataset visualization, allowing you to compare documents and prompts to understand how queries relate to your content.

Features

  • Dual file upload - separate drag-and-drop for documents and prompts
  • Multiple dimensionality reduction methods: PCA, t-SNE, and UMAP
  • Interactive 2D/3D visualizations with toggle between views
  • Color coding options by category, subcategory, or tags
  • Visual distinction: Documents appear as circles, prompts as diamonds with desaturated colors
  • Prompt visibility toggle - show/hide prompts to reduce visual clutter
  • Point inspection - click points to view full content and identify document vs prompt
  • Reset functionality - clear all data to start fresh
  • Sidebar layout with controls on left, large visualization area on right
  • Real-time visualization optimized for small to medium datasets

Network Dependency

Note: The application loads the Transformers.js library (v3.0.0) from cdn.jsdelivr.net for client-side embedding generation. This requires an active internet connection and sends requests to a third-party CDN. The application will function without internet if you only use the file upload features for pre-computed embeddings.

Quick Start

Installation

Option 1: Install with uv (recommended)

# Install as a CLI tool (no need to clone the repo)
uv tool install embeddingbuddy

# Run the application
embeddingbuddy serve

Option 2: Install with pip/pipx

# Install with pipx (isolated environment)
pipx install embeddingbuddy

# Or install with pip
pip install embeddingbuddy

# Run the application
embeddingbuddy

Option 3: Run with Docker

# Pull and run the Docker image
docker run -p 8050:8050 ghcr.io/godber/embedding-buddy:latest

The application will be available at http://127.0.0.1:8050

Using the Application

  1. Open your browser to http://127.0.0.1:8050
  2. Upload your data:
    • Drag and drop an NDJSON file containing embeddings (see Data Format below)
    • Optionally upload a second file with prompts to compare against documents
  3. Choose visualization settings:
    • Select dimensionality reduction method (PCA, t-SNE, or UMAP)
    • Choose 2D or 3D visualization
    • Pick color coding (by category, subcategory, or tags)
  4. Explore:
    • Click points to view full content
    • Toggle prompt visibility
    • Rotate and zoom 3D plots

Data Format

EmbeddingBuddy accepts newline-delimited JSON (NDJSON) files for both documents and prompts. Each line contains an embedding with the following structure:

Documents:

{"id": "doc_001", "embedding": [0.1, -0.3, 0.7, ...], "text": "Sample text content", "category": "news", "subcategory": "politics", "tags": ["election", "politics"]}
{"id": "doc_002", "embedding": [0.2, -0.1, 0.9, ...], "text": "Another example", "category": "review", "subcategory": "product", "tags": ["tech", "gadget"]}

Prompts:

{"id": "prompt_001", "embedding": [0.15, -0.28, 0.65, ...], "text": "Find articles about machine learning applications", "category": "search", "subcategory": "technology", "tags": ["AI", "research"]}
{"id": "prompt_002", "embedding": [0.72, 0.18, -0.35, ...], "text": "Show me product reviews for smartphones", "category": "search", "subcategory": "product", "tags": ["mobile", "reviews"]}

Required Fields:

  • embedding: Array of floating-point numbers representing the vector (must be same dimensionality for both documents and prompts)
  • text: String content associated with the embedding

Optional Fields:

  • id: Unique identifier (auto-generated if missing)
  • category: Primary classification
  • subcategory: Secondary classification
  • tags: Array of string tags for flexible labeling

Important: Document and prompt embeddings must have the same number of dimensions to be visualized together.

Installation & Usage

This project uses uv for dependency management.

  1. Install dependencies:
uv sync
  1. Run the application:
# Production mode (no debug, no auto-reload)
embeddingbuddy serve

# Development mode (debug + auto-reload on code changes)
embeddingbuddy serve --dev

# Debug logging only (no auto-reload)
embeddingbuddy serve --debug

# Custom host/port
embeddingbuddy serve --host 0.0.0.0 --port 8080
  1. Open your browser to http://127.0.0.1:8050

  2. Test with sample data:

    • Upload sample_data.ndjson (documents)
    • Upload sample_prompts.ndjson (prompts) to see dual visualization
    • Use the "Show prompts" toggle to compare how prompts relate to documents

Docker

You can also run EmbeddingBuddy using Docker:

Basic Usage

# Run in the background
docker compose up -d

The application will be available at http://127.0.0.1:8050

With OpenSearch

To run with OpenSearch for enhanced search capabilities:

# Run in the background with OpenSearch
docker compose --profile opensearch up -d

This will start both the EmbeddingBuddy application and an OpenSearch instance. OpenSearch will be available at http://127.0.0.1:9200

Docker Commands

# Stop all services
docker compose down

# Stop and remove volumes
docker compose down -v

# View logs
docker compose logs embeddingbuddy
docker compose logs opensearch

# Rebuild containers
docker compose build

Development

Project Structure

The application follows a modular architecture for improved maintainability and testability:

src/embeddingbuddy/
├── app.py                     # Main application entry point and factory
├── config/                    # Configuration management
│   └── settings.py            # Centralized app settings
├── data/                      # Data parsing and processing
│   ├── parser.py              # NDJSON parsing logic
│   ├── processor.py           # Data transformation utilities
│   └── sources/               # Data source integrations
│       └── opensearch.py      # OpenSearch data source
├── models/                    # Data schemas and algorithms
│   ├── schemas.py             # Pydantic data models
│   ├── reducers.py            # Dimensionality reduction algorithms
│   └── field_mapper.py        # Field mapping utilities
├── visualization/             # Plot creation and styling
│   ├── plots.py               # Plot factory and creation logic
│   └── colors.py              # Color mapping utilities
├── ui/                        # User interface components
│   ├── layout.py              # Main application layout
│   ├── components/            # Reusable UI components
│   │   ├── sidebar.py         # Sidebar component
│   │   ├── upload.py          # Upload components
│   │   ├── textinput.py       # Text input components
│   │   └── datasource.py      # Data source components
│   └── callbacks/             # Organized callback functions
│       ├── data_processing.py # Data upload/processing callbacks
│       ├── visualization.py   # Plot update callbacks
│       └── interactions.py    # User interaction callbacks
└── utils/                     # Utility functions

# CLI entry point
embeddingbuddy serve    # Main CLI command to start the server

Testing

Run the test suite to verify functionality:

# Install test dependencies
uv sync --extra test

# Run all tests
uv run pytest tests/ -v

# Run specific test file
uv run pytest tests/test_data_processing.py -v

# Run with coverage
uv run pytest tests/ --cov=src/embeddingbuddy

Development Tools

Install development dependencies for linting, type checking, and security:

# Install all dev dependencies
uv sync --extra dev

# Or install specific groups
uv sync --extra test        # Testing tools
uv sync --extra lint        # Linting and formatting
uv sync --extra security    # Security scanning tools

# Run linting
uv run ruff check src/ tests/
uv run ruff format src/ tests/

# Run type checking
uv run mypy src/embeddingbuddy/

# Run security scans
uv run bandit -r src/
uv run safety check

Adding New Features

The modular architecture makes it easy to extend functionality:

  • New reduction algorithms: Add to models/reducers.py
  • New plot types: Extend visualization/plots.py
  • UI components: Add to ui/components/
  • Configuration options: Update config/settings.py

Tech Stack

  • Python Dash: Web application framework
  • Plotly: Interactive plotting and visualization
  • scikit-learn: PCA implementation
  • UMAP-learn: UMAP dimensionality reduction
  • openTSNE: Fast t-SNE implementation
  • NumPy/Pandas: Data manipulation and analysis
  • pytest: Testing framework
  • uv: Modern Python package and project manager

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embeddingbuddy-0.8.0.tar.gz (43.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

embeddingbuddy-0.8.0-py3-none-any.whl (42.1 kB view details)

Uploaded Python 3

File details

Details for the file embeddingbuddy-0.8.0.tar.gz.

File metadata

  • Download URL: embeddingbuddy-0.8.0.tar.gz
  • Upload date:
  • Size: 43.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for embeddingbuddy-0.8.0.tar.gz
Algorithm Hash digest
SHA256 1d108be235fd93ed964d687a6c905bcfa5f3d05efb455c4d85f68d0aee120d4a
MD5 77f64c902024da440a7c29e7802c152c
BLAKE2b-256 cd77b9913536a89e20d2b3659350da01bda2f2403334fd51d57b6467c7717d56

See more details on using hashes here.

Provenance

The following attestation bundles were made for embeddingbuddy-0.8.0.tar.gz:

Publisher: pypi-release.yml on godber/EmbeddingBuddy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file embeddingbuddy-0.8.0-py3-none-any.whl.

File metadata

  • Download URL: embeddingbuddy-0.8.0-py3-none-any.whl
  • Upload date:
  • Size: 42.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for embeddingbuddy-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 507271bb5c795d8f3650258843a4a97a74b5e9517e1ed3ccbbcd11d992061358
MD5 e5ab6fa6810bca78fa6c1c3d3ee8f65c
BLAKE2b-256 4aa7065676e43519cae73205f103312aa7ab51da72bc8e949d30adda20a1e891

See more details on using hashes here.

Provenance

The following attestation bundles were made for embeddingbuddy-0.8.0-py3-none-any.whl:

Publisher: pypi-release.yml on godber/EmbeddingBuddy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page