A Python Dash application for interactive exploration and visualization of embedding vectors through dimensionality reduction techniques.
Project description
EmbeddingBuddy
A modular Python Dash web application for interactive exploration and visualization of embedding vectors through dimensionality reduction techniques. Compare documents and prompts in the same embedding space to understand semantic relationships.
Overview
EmbeddingBuddy provides an intuitive web interface for analyzing high-dimensional embedding vectors by applying various dimensionality reduction algorithms and visualizing the results in interactive 2D and 3D plots. The application features a clean, modular architecture that makes it easy to test, maintain, and extend with new features. It supports dual dataset visualization, allowing you to compare documents and prompts to understand how queries relate to your content.
Features
- Dual file upload - separate drag-and-drop for documents and prompts
- Multiple dimensionality reduction methods: PCA, t-SNE, and UMAP
- Interactive 2D/3D visualizations with toggle between views
- Color coding options by category, subcategory, or tags
- Visual distinction: Documents appear as circles, prompts as diamonds with desaturated colors
- Prompt visibility toggle - show/hide prompts to reduce visual clutter
- Point inspection - click points to view full content and identify document vs prompt
- Reset functionality - clear all data to start fresh
- Sidebar layout with controls on left, large visualization area on right
- Real-time visualization optimized for small to medium datasets
Network Dependency
Note: The application loads the Transformers.js library (v3.0.0) from cdn.jsdelivr.net for client-side embedding generation. This requires an active internet connection and sends requests to a third-party CDN. The application will function without internet if you only use the file upload features for pre-computed embeddings.
Quick Start
Installation
Option 1: Install with uv (recommended)
# Install as a CLI tool (no need to clone the repo)
uv tool install embeddingbuddy
# Run the application
embeddingbuddy serve
Option 2: Install with pip/pipx
# Install with pipx (isolated environment)
pipx install embeddingbuddy
# Or install with pip
pip install embeddingbuddy
# Run the application
embeddingbuddy
Option 3: Run with Docker
# Pull and run the Docker image
docker run -p 8050:8050 ghcr.io/godber/embedding-buddy:latest
The application will be available at http://127.0.0.1:8050
Using the Application
- Open your browser to http://127.0.0.1:8050
- Upload your data:
- Drag and drop an NDJSON file containing embeddings (see Data Format below)
- Optionally upload a second file with prompts to compare against documents
- Choose visualization settings:
- Select dimensionality reduction method (PCA, t-SNE, or UMAP)
- Choose 2D or 3D visualization
- Pick color coding (by category, subcategory, or tags)
- Explore:
- Click points to view full content
- Toggle prompt visibility
- Rotate and zoom 3D plots
Data Format
EmbeddingBuddy accepts newline-delimited JSON (NDJSON) files for both documents and prompts. Each line contains an embedding with the following structure:
Documents:
{"id": "doc_001", "embedding": [0.1, -0.3, 0.7, ...], "text": "Sample text content", "category": "news", "subcategory": "politics", "tags": ["election", "politics"]}
{"id": "doc_002", "embedding": [0.2, -0.1, 0.9, ...], "text": "Another example", "category": "review", "subcategory": "product", "tags": ["tech", "gadget"]}
Prompts:
{"id": "prompt_001", "embedding": [0.15, -0.28, 0.65, ...], "text": "Find articles about machine learning applications", "category": "search", "subcategory": "technology", "tags": ["AI", "research"]}
{"id": "prompt_002", "embedding": [0.72, 0.18, -0.35, ...], "text": "Show me product reviews for smartphones", "category": "search", "subcategory": "product", "tags": ["mobile", "reviews"]}
Required Fields:
embedding: Array of floating-point numbers representing the vector (must be same dimensionality for both documents and prompts)text: String content associated with the embedding
Optional Fields:
id: Unique identifier (auto-generated if missing)category: Primary classificationsubcategory: Secondary classificationtags: Array of string tags for flexible labeling
Important: Document and prompt embeddings must have the same number of dimensions to be visualized together.
Installation & Usage
This project uses uv for dependency management.
- Install dependencies:
uv sync
- Run the application:
# Production mode (no debug, no auto-reload)
embeddingbuddy serve
# Development mode (debug + auto-reload on code changes)
embeddingbuddy serve --dev
# Debug logging only (no auto-reload)
embeddingbuddy serve --debug
# Custom host/port
embeddingbuddy serve --host 0.0.0.0 --port 8080
-
Open your browser to http://127.0.0.1:8050
-
Test with sample data:
- Upload
sample_data.ndjson(documents) - Upload
sample_prompts.ndjson(prompts) to see dual visualization - Use the "Show prompts" toggle to compare how prompts relate to documents
- Upload
Docker
You can also run EmbeddingBuddy using Docker:
Basic Usage
# Run in the background
docker compose up -d
The application will be available at http://127.0.0.1:8050
With OpenSearch
To run with OpenSearch for enhanced search capabilities:
# Run in the background with OpenSearch
docker compose --profile opensearch up -d
This will start both the EmbeddingBuddy application and an OpenSearch instance. OpenSearch will be available at http://127.0.0.1:9200
Docker Commands
# Stop all services
docker compose down
# Stop and remove volumes
docker compose down -v
# View logs
docker compose logs embeddingbuddy
docker compose logs opensearch
# Rebuild containers
docker compose build
Development
Project Structure
The application follows a modular architecture for improved maintainability and testability:
src/embeddingbuddy/
├── app.py # Main application entry point and factory
├── config/ # Configuration management
│ └── settings.py # Centralized app settings
├── data/ # Data parsing and processing
│ ├── parser.py # NDJSON parsing logic
│ ├── processor.py # Data transformation utilities
│ └── sources/ # Data source integrations
│ └── opensearch.py # OpenSearch data source
├── models/ # Data schemas and algorithms
│ ├── schemas.py # Pydantic data models
│ ├── reducers.py # Dimensionality reduction algorithms
│ └── field_mapper.py # Field mapping utilities
├── visualization/ # Plot creation and styling
│ ├── plots.py # Plot factory and creation logic
│ └── colors.py # Color mapping utilities
├── ui/ # User interface components
│ ├── layout.py # Main application layout
│ ├── components/ # Reusable UI components
│ │ ├── sidebar.py # Sidebar component
│ │ ├── upload.py # Upload components
│ │ ├── textinput.py # Text input components
│ │ └── datasource.py # Data source components
│ └── callbacks/ # Organized callback functions
│ ├── data_processing.py # Data upload/processing callbacks
│ ├── visualization.py # Plot update callbacks
│ └── interactions.py # User interaction callbacks
└── utils/ # Utility functions
# CLI entry point
embeddingbuddy serve # Main CLI command to start the server
Testing
Run the test suite to verify functionality:
# Install test dependencies
uv sync --extra test
# Run all tests
uv run pytest tests/ -v
# Run specific test file
uv run pytest tests/test_data_processing.py -v
# Run with coverage
uv run pytest tests/ --cov=src/embeddingbuddy
Development Tools
Install development dependencies for linting, type checking, and security:
# Install all dev dependencies
uv sync --extra dev
# Or install specific groups
uv sync --extra test # Testing tools
uv sync --extra lint # Linting and formatting
uv sync --extra security # Security scanning tools
# Run linting
uv run ruff check src/ tests/
uv run ruff format src/ tests/
# Run type checking
uv run mypy src/embeddingbuddy/
# Run security scans
uv run bandit -r src/
uv run safety check
Adding New Features
The modular architecture makes it easy to extend functionality:
- New reduction algorithms: Add to
models/reducers.py - New plot types: Extend
visualization/plots.py - UI components: Add to
ui/components/ - Configuration options: Update
config/settings.py
Tech Stack
- Python Dash: Web application framework
- Plotly: Interactive plotting and visualization
- scikit-learn: PCA implementation
- UMAP-learn: UMAP dimensionality reduction
- openTSNE: Fast t-SNE implementation
- NumPy/Pandas: Data manipulation and analysis
- pytest: Testing framework
- uv: Modern Python package and project manager
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file embeddingbuddy-0.8.0.tar.gz.
File metadata
- Download URL: embeddingbuddy-0.8.0.tar.gz
- Upload date:
- Size: 43.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d108be235fd93ed964d687a6c905bcfa5f3d05efb455c4d85f68d0aee120d4a
|
|
| MD5 |
77f64c902024da440a7c29e7802c152c
|
|
| BLAKE2b-256 |
cd77b9913536a89e20d2b3659350da01bda2f2403334fd51d57b6467c7717d56
|
Provenance
The following attestation bundles were made for embeddingbuddy-0.8.0.tar.gz:
Publisher:
pypi-release.yml on godber/EmbeddingBuddy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
embeddingbuddy-0.8.0.tar.gz -
Subject digest:
1d108be235fd93ed964d687a6c905bcfa5f3d05efb455c4d85f68d0aee120d4a - Sigstore transparency entry: 580016419
- Sigstore integration time:
-
Permalink:
godber/EmbeddingBuddy@d30387e201dc01b10a560deaf2ddffde1a91f600 -
Branch / Tag:
refs/tags/v0.8.0 - Owner: https://github.com/godber
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-release.yml@d30387e201dc01b10a560deaf2ddffde1a91f600 -
Trigger Event:
push
-
Statement type:
File details
Details for the file embeddingbuddy-0.8.0-py3-none-any.whl.
File metadata
- Download URL: embeddingbuddy-0.8.0-py3-none-any.whl
- Upload date:
- Size: 42.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
507271bb5c795d8f3650258843a4a97a74b5e9517e1ed3ccbbcd11d992061358
|
|
| MD5 |
e5ab6fa6810bca78fa6c1c3d3ee8f65c
|
|
| BLAKE2b-256 |
4aa7065676e43519cae73205f103312aa7ab51da72bc8e949d30adda20a1e891
|
Provenance
The following attestation bundles were made for embeddingbuddy-0.8.0-py3-none-any.whl:
Publisher:
pypi-release.yml on godber/EmbeddingBuddy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
embeddingbuddy-0.8.0-py3-none-any.whl -
Subject digest:
507271bb5c795d8f3650258843a4a97a74b5e9517e1ed3ccbbcd11d992061358 - Sigstore transparency entry: 580016531
- Sigstore integration time:
-
Permalink:
godber/EmbeddingBuddy@d30387e201dc01b10a560deaf2ddffde1a91f600 -
Branch / Tag:
refs/tags/v0.8.0 - Owner: https://github.com/godber
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-release.yml@d30387e201dc01b10a560deaf2ddffde1a91f600 -
Trigger Event:
push
-
Statement type: