Skip to main content

A command line utility to create plots of word embeddings

Project description

Embedding Plot Visualization Tool

example-plot

Description

Word embeddings transform words to highly-dimensional vectors. The vectors attempt to capture the semantic meaning and relationships of the words, so that similar or related words have similar vectors. For example "Cat", "Kitten", "Feline", "Tiger" and "Lion" would have embedding vectors that are similar to varying degree, but would all be very dissimilar to a word like "Toolbox".

The Word2Vec embedding model has 300 dimensions that capture the semantic meaning of each word. It's not possible to visualize 300 dimensions, but we can use dimensional reduction techniques that project the dimensions to a 2 or 3 latent space that preserves much of the relationships that we can easily visualize.

Embedding-plot, is a command line utility that can visualize word embeddings using dimensionality reduction techniques (PCA or t-SNE) and clustering in a scatter plot.

Features

  • Supports Word2vec pretrained embedding models
  • Dimensionality reduction using PCA or t-SNE
  • Specify a number of clusters to identify in the plot
  • Interactive HTML output

Installation

Prerequisites

  • Python 3.9 or higher.

Install via pip

pip install embeddings_plot 

Embedding model

To use this tool, you have to either train your own embedding model or use an existing pretrained model. This tool expected the models to be in word2vec format. Two pretrained models ready to use are:

Download one these models and unzip it, train your own model, or look for other pretrained word2vec models available on the internet.

Usage

After installation, you can use the tool from the command line.

Basic Command

embeddings-plot -m <model_path> -i <input_file> -o <output_file> --label

Parameters

  • -m, --model: Path to the word embeddings model file
  • -i, --input: Input text file with words to visualize
  • -o, --output: Output HTML file for the visualization
  • -l, --labels: (Optional) Show labels on the plot
  • -c, --clusters: (Optional) Number of clusters for KMeans. Default is 5.
  • -r, --reduction: (Optional) Method for dimensionality reduction (PCA or t-SNE). Default is t-SNE
  • -t, --title: (Optional) Sets the title of the output HTML page

Example

embeddings-plot --model crawl-300d-2M.vec --input words.txt --output embedding-plot.html --labels --clusters 13 

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embeddings_plot-0.1.0.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

embeddings_plot-0.1.0-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file embeddings_plot-0.1.0.tar.gz.

File metadata

  • Download URL: embeddings_plot-0.1.0.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.4 Darwin/22.6.0

File hashes

Hashes for embeddings_plot-0.1.0.tar.gz
Algorithm Hash digest
SHA256 393f7758f458d7ed5cfb7034bdd5e395698f9ab091c4712c052385be9ac797c9
MD5 fc0cb26257d7a5ea67ccbe06702f9a48
BLAKE2b-256 e2f1888b0cbc5ccbeff7171f2826fb0a39f511a13e75dd69c01daf10cf022a02

See more details on using hashes here.

File details

Details for the file embeddings_plot-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: embeddings_plot-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.4 Darwin/22.6.0

File hashes

Hashes for embeddings_plot-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a92c2c1cca4fdf49a308941bd4104f7c22ca66d9808ad580ac09d6a9a0a2d728
MD5 9d29dd2313a19160733f511b800fb07b
BLAKE2b-256 8be89f0e606764b18ddb8563a33b55348924094c22fa39c1be33609f3a691ab6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page