Skip to main content

Gain a clue by clustering!

Project description

cluestar

Gain a clue by clustering!

This library contains visualisation tools that might help you get started with classification tasks. The idea is that if you can inspect clusters easily, you might gain a clue on what good labels for your dataset might be!

It generates charts that looks like this:

Install

python -m pip install "cluestar @ git+https://github.com/koaning/cluestar.git"

Interactive Demo

You can see an interactive demo of the generated widgets here.

You can also toy around with the demo notebook found here.

Usage

The first step is to encode textdata in two dimensions, like below.

from sklearn.pipeline import make_pipeline
from sklearn.decomposition import TruncatedSVD
from sklearn.feature_extraction.text import TfidfVectorizer

pipe = make_pipeline(TfidfVectorizer(), TruncatedSVD(n_components=2))

X = pipe.fit_transform(texts)

From here you can make an interactive chart via;

from cluestar import plot_text

plot_text(X, texts)

The best results are likely found when you use umap together with something like universal sentence encoder.

You might also improve the understandability by highlighting points that have a certain word in it.

plot_text(X, texts, color_words=["plastic", "voucher", "deliver"])

You can also use a numeric array, one that contains proba-values for prediction, to influence the color.

# First, get an array of pvals from some model
p_vals = some_model.predict(texts)[:, 0]
# Use these to assign pretty colors.
plot_text(X, texts, color_array=p_vals)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cluestar-0.1.0.tar.gz (4.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cluestar-0.1.0-py3.10.egg (5.4 kB view details)

Uploaded Egg

cluestar-0.1.0-py2.py3-none-any.whl (4.6 kB view details)

Uploaded Python 2Python 3

File details

Details for the file cluestar-0.1.0.tar.gz.

File metadata

  • Download URL: cluestar-0.1.0.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.4

File hashes

Hashes for cluestar-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9d6d680aa334244ce1bfa3db3a8c001539c70a752464bba009dc9c9fe553b5d0
MD5 f1b4669b68df08f2ce1299efb5c392bf
BLAKE2b-256 8f618e8cc655ad9b9015f4a2c6214b7ad9a5d4cbde11fc03760f9aae56736476

See more details on using hashes here.

File details

Details for the file cluestar-0.1.0-py3.10.egg.

File metadata

  • Download URL: cluestar-0.1.0-py3.10.egg
  • Upload date:
  • Size: 5.4 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.4

File hashes

Hashes for cluestar-0.1.0-py3.10.egg
Algorithm Hash digest
SHA256 ccc05502db4882698f7539e91994d53ae646281d5bf1ed8f864654ddb008c1b3
MD5 1cdd11c1ae739daf79f3a95e27dbc079
BLAKE2b-256 6f926681b73b4913237e778c137b62ffc1f376528ce3f456addb75652da6875d

See more details on using hashes here.

File details

Details for the file cluestar-0.1.0-py2.py3-none-any.whl.

File metadata

  • Download URL: cluestar-0.1.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 4.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.4

File hashes

Hashes for cluestar-0.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 bde43b8df89570b9d564ec69c179c255c2036e8407fcd883d383a8c43c02e9ad
MD5 df544f3bf0b0be19ae0ee7ea24466ef4
BLAKE2b-256 6c2efc19cf0a467b93e316982054c651ae9c43bdf5407fd24b12c32a1d2f55cf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page