Tools for interactive visual inspection of semantic embeddings.

These details have not been verified by PyPI

Project description

embedding-explorer

Tools for interactive visual exploration of semantic embeddings.

New in version 0.6.0

You can now pass a custom Neofuzz process to the explorer if you have specific requirements.

from embedding_explorer import show_network_explorer
from neofuzz import char_ngram_process

process = char_ngram_process()
show_network_explorer(corpus=corpus, embeddings=embeddings, fuzzy_search=process)

Installation

Install embedding-explorer from PyPI:

pip install embedding-explorer

Semantic Explorer

embedding-explorer comes with a web application built for exploring semantic relations in a corpus with the help of embeddings. In this section I will show a couple of examples of running the app with different embedding models and corpora.

Static Word Embeddings

Let's say that you would like to explore semantic relations by investigating word embeddings generated with Word2Vec. You can do this by passing the vocabulary of the model and the embedding matrix to embedding-explorer.

For this example I will use Gensim, which can be installed from PyPI:

pip install gensim

We will download GloVe Twitter 25 from gensim's repositories.

from gensim import downloader
from embedding_explorer import show_network_explorer

model = downloader.load("glove-twitter-25")
vocabulary = model.index_to_key
embeddings = model.vectors
show_network_explorer(corpus=vocabulary, embeddings=embeddings)

This will open a new browser window with the Explorer, where you can enter seed words and set the number of associations that you would like to see on the screen.

Screenshot of the Explorer

Dynamic Embedding Models

If you want to explore relations in a corpus using let's say a sentence transformer, which creates contextually aware embeddings, you can do so by specifying a scikit-learn compatible vectorizer model instead of passing along an embedding matrix.

One clear advantage here is that you can input arbitrary sequences as seeds instead of a predetermined set of texts.

We are going to use the package embetter for embedding documents.

pip install embetter[sentence-trf]

I decided to examine four-grams in the 20newsgroups dataset. We will limit the number of four-grams to 4000 so we only see the most relevant ones.

from embetter.text import SentenceEncoder
from embedding_explorer import show_network_explorer
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer

corpus = fetch_20newsgroups(
    remove=("headers", "footers", "quotes"),
).data
# We will use CountVectorizer for obtaining the possible n-grams
four_grams = (
    CountVectorizer(
        stop_words="english", ngram_range=(4, 4), max_features=4000
    )
    .fit(corpus)
    .get_feature_names_out()
)

model = SentenceEncoder()
show_network_explorer(corpus=four_grams, vectorizer=model)

Screenshot of the Explorer

Projection and Clustering

:star2: New in version 0.5.0

In embedding-explorer you can now inspect corpora or embeddings by projecting them into 2D space, and optionally clustering observations.

In this example I'm going to demonstrate how to visualize 20 Newsgroups using various projection and clustering methods in embedding-explorer. We are going to use sentence transformers to encode texts.

from embetter.text import SentenceEncoder
from sklearn.datasets import fetch_20newsgroups

from embedding_explorer import show_clustering

newsgroups = fetch_20newsgroups(
    remove=("headers", "footers", "quotes"),
)
corpus = newsgroups.data

show_clustering(corpus=corpus, vectorizer=SentenceEncoder())

In the app you can whether or how you want to reduce embedding dimensionality, how you want to cluster the embeddings, and also how you intend to project them onto the 2D plane.

Screenshot of the Clustering parameters

After this you can investigate the semantic structure of your corpus interactively.

Screenshot of the Clustering

Dashboard

If you have multiple models to examine the same corpus or multiple corpora, that you want to examine with the same model, then you can create a dashboard containing all of these options, that users will be able to click on and that takes them to the appropriate explorer page.

For this we will have to assemble these options into a list of Card objects, that contain the information about certain pages.

In the following example I will set up two different sentence transformers with the same corpus from the previous example.

from embetter.text import SentenceEncoder
from embedding_explorer import show_dashboard
from embedding_explorer.cards import NetworkCard, ClusteringCard

cards = [
    NetworkCard("MiniLM", corpus=four_grams, vectorizer=SentenceEncoder("all-MiniLM-L12-v2")),
    NetworkCard("MPNET", corpus=four_grams, vectorizer=SentenceEncoder("all-mpnet-base-v2")),
]
show_dashboard(cards)

Screenshot of the Dashboard

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.6.0

Sep 6, 2024

0.5.3

Jun 20, 2024

0.5.2

Nov 15, 2023

0.5.1

Nov 13, 2023

0.5.0

Nov 13, 2023

0.4.0

Oct 9, 2023

0.3.0

Oct 9, 2023

0.2.2

May 30, 2023

0.2.1

May 30, 2023

0.2.0

May 30, 2023

0.1.4

Apr 17, 2023

0.1.3

Apr 17, 2023

0.1.2

Apr 11, 2023

0.1.1

Mar 27, 2023

0.1.0

Mar 21, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embedding_explorer-0.6.0.tar.gz (39.9 kB view details)

Uploaded Sep 6, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

embedding_explorer-0.6.0-py3-none-any.whl (43.9 kB view details)

Uploaded Sep 6, 2024 Python 3

File details

Details for the file embedding_explorer-0.6.0.tar.gz.

File metadata

Download URL: embedding_explorer-0.6.0.tar.gz
Upload date: Sep 6, 2024
Size: 39.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.2 CPython/3.9.18 Linux/5.15.0-119-generic

File hashes

Hashes for embedding_explorer-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`e26d26f7a9a22e09ffeab1ef77f61942d9c94b8ceabce91f988aa5d107170be1`
MD5	`129fe8dc77f2f7ad8f916e0c77e214a3`
BLAKE2b-256	`3aca23c9db154f686277f8d543bcf0a8b81981763e8708f6828cf49f2514d8c5`

See more details on using hashes here.

File details

Details for the file embedding_explorer-0.6.0-py3-none-any.whl.

File metadata

Download URL: embedding_explorer-0.6.0-py3-none-any.whl
Upload date: Sep 6, 2024
Size: 43.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.2 CPython/3.9.18 Linux/5.15.0-119-generic

File hashes

Hashes for embedding_explorer-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a152bfc593e8464cab2e141a737dd36ac1ddfd03f63ed6a0f3f64730524c1a18`
MD5	`0267c6e6d7e3a0ef60f64f9d7173a358`
BLAKE2b-256	`b8ffa3aecc2fb096aa81e581e4117960b3d963de0390e6459a3769a21cad9475`

See more details on using hashes here.

embedding-explorer 0.6.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

embedding-explorer

New in version 0.6.0

Installation

Semantic Explorer

Static Word Embeddings

Dynamic Embedding Models

Projection and Clustering

Dashboard

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes