Skip to main content

Widget for exploring and sampling words from text data through w2v

Project description

w2widget

Widget for exploring and sampling words from text data through word2vec models in order to construct topic dictionaries.

Package content

The w2widget package contains two modules:

  • doc2vec.py
  • widget.py

Examples

In the widget_example.ipynb you can play with the widget from pretrained data from Reuters dataset.

If you want to see an example of the data-workflow generating the necessary input, check out workflow_example.ipynb.

Doc2Vec

This module helps with calculating and handling doc2vec. The approach applied is that every document's vector is calculated by taking a weighted (ie. based on inverse frequencies) average of the document's word vectors.

from w2widget.doc2vec import calculate_inverse_frequency, Doc2Vec

# Calculate word weigts from inverse frequency
word_weights = calculate_inverse_frequency(document_tokens)

# Initiate the model
dv_model = Doc2Vec(wv_model, word_weights)

# Add documents and calculated the document vectors
dv_model.add_doc2vec(document_tokens)

# reduce the dimensions
dv_model.reduce_dimensions()

# Store the embeddings
two_dim_doc_embedding = dv_model.TSNE_embedding_array

Widget

This widget module displays the results from:

  • A gensim word2vec model,
  • it's 2-dimensional embedding (ie. TSNE).
  • The custom implemented doc2vec model,
  • it's 2-dimensional embedding (ie. TSNE).
  • A list of tokenized documents with whitespaces and
  • optionally a list of initial search words
from w2widget.widget import Widget

wv_widget = Widget(
    wv_model,
    two_dim_word_embedding,
    tokens_with_ws
    dv_model=None,
    two_dim_doc_embedding=None,
    initial_search_words=[],
)

wv_widget.display_widget()

You can save the topics to a json file from the widget, or access them from the dictionary stored in wv_widget.topics.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

w2widget-0.0.3.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

w2widget-0.0.3-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file w2widget-0.0.3.tar.gz.

File metadata

  • Download URL: w2widget-0.0.3.tar.gz
  • Upload date:
  • Size: 13.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.15

File hashes

Hashes for w2widget-0.0.3.tar.gz
Algorithm Hash digest
SHA256 7e4bec16d5f51c55ef1d24c002046b1bd2fa24555aff7e27de7b09e3ce7bd6a3
MD5 cc3d7ac9b65b118edb646449f314562a
BLAKE2b-256 fcb763b450dba9598b8b5a55c329dfcfa65336b1949e119a6c710274b01a9ddd

See more details on using hashes here.

File details

Details for the file w2widget-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: w2widget-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 13.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.15

File hashes

Hashes for w2widget-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 748a5f5332adeb5b254a8911734d2f263895fb1999cb9624b9bafdeec269e4c5
MD5 5423f01aabff3c50f1649555f8462a22
BLAKE2b-256 834d9545324981b025bc123f01ed79df1c530e75f0c158c8027c14336c8e2306

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page