A tool for visualizing embeddings
Project description
Embedding Atlas
A Python package that provides a command line tool to visualize a dataset with embeddings. It also includes a Python Notebook (e.g., Jupyter) widget and a Streamlit widget.
- Documentation: https://apple.github.io/embedding-atlas
- GitHub: https://github.com/apple/embedding-atlas
Installation
pip install embedding-atlas
and then launch the command line tool:
embedding-atlas [OPTIONS] INPUTS...
Loading Data
You can load your data in two ways: locally or from Hugging Face.
Loading Local Data
To get started with your own data, run:
embedding-atlas path_to_dataset.parquet
Loading Hugging Face Data
You can instead load datasets from Hugging Face:
embedding-atlas huggingface_org/dataset_name
Visualizing Embedding Projections
To visual embedding projections, pre-compute the X and Y coordinates, and specify the column names with --x and --y, such as:
embedding-atlas path_to_dataset.parquet --x projection_x --y projection_y
You may use the SentenceTransformers package to compute high-dimensional embeddings from text data, and then use the UMAP package to compute 2D projections.
Using Pre-computed Vectors
If you already have pre-computed embedding vectors (but not the 2D projections), you can specify the column containing the vectors with --vector:
embedding-atlas path_to_dataset.parquet --vector embedding_vectors
This will apply UMAP dimensionality reduction to your pre-existing vectors without recomputing embeddings. The vectors should be stored as lists or numpy arrays in your dataset.
You may also specify a column for pre-computed nearest neighbors:
embedding-atlas path_to_dataset.parquet --x projection_x --y projection_y --neighbors neighbors
The neighbors column should have values in the following format: {"ids": [id1, id2, ...], "distances": [d1, d2, ...]}.
If this column is specified, you'll be able to see nearest neighbors for a selected point in the tool.
Local Development
Launch Embedding Altas with a wine reviews dataset with ./start.sh and the MNIST dataset with ./start_image.sh.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file embedding_atlas-0.20.0-py3-none-any.whl.
File metadata
- Download URL: embedding_atlas-0.20.0-py3-none-any.whl
- Upload date:
- Size: 26.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e7f7ec7c14ef4689443aca0cf790abb2c573abc283111b133cfc27c5356b356a
|
|
| MD5 |
a1e48659c7a2bd9aeaf8aa5dc71be94a
|
|
| BLAKE2b-256 |
623310a963b33b2d30b92d7ba5bb683923e90fec097270907dda3ba5c65e32f2
|
Provenance
The following attestation bundles were made for embedding_atlas-0.20.0-py3-none-any.whl:
Publisher:
ci.yml on apple/embedding-atlas
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
embedding_atlas-0.20.0-py3-none-any.whl -
Subject digest:
e7f7ec7c14ef4689443aca0cf790abb2c573abc283111b133cfc27c5356b356a - Sigstore transparency entry: 1244770738
- Sigstore integration time:
-
Permalink:
apple/embedding-atlas@5938e875890b49a432648426db4e97ba977f0b67 -
Branch / Tag:
refs/tags/v0.20.0 - Owner: https://github.com/apple
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@5938e875890b49a432648426db4e97ba977f0b67 -
Trigger Event:
release
-
Statement type: