Skip to main content

Gain a clue by clustering!

Project description

cluestar

Gain a clue by clustering!

This library contains visualisation tools that might help you get started with classification tasks. The idea is that if you can inspect clusters easily, you might gain a clue on what good labels for your dataset might be!

It generates charts that looks like this:

Normal plot

There's even a fancy chart that can compare embedding techniques.

Comparing two embeddings

Install

python -m pip install cluestar

Interactive Demo

You can see an interactive demo of the generated widgets here.

You can also toy around with the demo notebook found here.

Usage

The first step is to encode textdata in two dimensions, like below.

from sklearn.pipeline import make_pipeline
from sklearn.decomposition import TruncatedSVD
from sklearn.feature_extraction.text import TfidfVectorizer

pipe = make_pipeline(TfidfVectorizer(), TruncatedSVD(n_components=2))

X = pipe.fit_transform(texts)

From here you can make an interactive chart via;

from cluestar import plot_text

plot_text(X, texts)

The best results are likely found when you use umap together with something like universal sentence encoder.

You might also improve the understandability by highlighting points that have a certain word in it.

plot_text(X, texts, color_words=["plastic", "voucher", "deliver"])

You can also use a numeric array, one that contains proba-values for prediction, to influence the color.

# First, get an array of pvals from some model
p_vals = some_model.predict(texts)[:, 0]
# Use these to assign pretty colors.
plot_text(X, texts, color_array=p_vals)

You can also compare two embeddings interactively. To do this:

from cluestar import plot_text_comparison

plot_text(X1=X, X2=X, texts)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cluestar-0.2.1.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

cluestar-0.2.1-py2.py3-none-any.whl (5.2 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file cluestar-0.2.1.tar.gz.

File metadata

  • Download URL: cluestar-0.2.1.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.6

File hashes

Hashes for cluestar-0.2.1.tar.gz
Algorithm Hash digest
SHA256 d3b816d8a2b60c60a08737a9225129aaae6757983273cd8e9fb8645105d5c61e
MD5 d7dac9f17a000fff77d2b5047171e736
BLAKE2b-256 1acbc36629da325ed5773c48f4f1e34f2ecfe2db2e2b2d0cfa304b4a613604d7

See more details on using hashes here.

File details

Details for the file cluestar-0.2.1-py2.py3-none-any.whl.

File metadata

  • Download URL: cluestar-0.2.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.6

File hashes

Hashes for cluestar-0.2.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 fcf2cd43c3385130cbc71509947f545d9edbdef6a1811e4dbb007e888bcf7fae
MD5 22775d947653f509c1b8543942c07864
BLAKE2b-256 7d1d8cfcec80f1dbe1221dbaa61026b1b9991bb87125ca187f24e04a4f47ec65

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page