Skip to main content

Sampling from scatter-plot visualizations

Project description

SADIRE: Sampling from scatter-plot visualizations

SADIRE on MNIST

Scatter plot-based representations of dimensionality reduction (such as t-SNE and UMAP) can help us to understand various patterns in high-dimensional datasets. However, due to the huge size of datasets in practical applications, these representations often result in cluttered layouts. With SADIRE, you can reduce the size of the dataset while preserving the context and structural relations imposed by dimensionality reduction techniques.

Requirements

SADIRE uses a QuadTree for selecting representative data points, we chose Pyqtree for this matter:

Instalation

pip install sadire

Citation

@article{MarcilioJr2020_SADIRE,
  author = "Marcílio-Jr, W. E. and Eler, D. M.",
  year = "2020",
  title = "SADIRE: a context-preserving sampling technique for dimensionality reduction visualizations",
  journal = "Journal of Visualization",
  pages = "999--1013"
}

Usage

SADIRE samples from a 2D representation of a multidimensional dataset. It was designed to preserve the relationship imposed by a dimensionality reduction technique.

Load a dataset and reduce
iris_data = load_iris()

X, y = iris_data.data, iris_data.target
Reduce to 2D
reducer = umap.UMAP(random_state=0)
embedding = reducer.fit_transform(X)
Use SADIRE
import sadire

"""
SADIRE uses the concept of windows to select samples and remove redundancy.
 * alpha is the size of the window
 * beta is the size of each block (or superpixel) in a window

The greater are these parameters, more scattered will be the representative data points. 

Using alpha = 2 or 3 and beta between 4 and 10 works fine for the datasets we have tested.
Please, see the paper for more details.

"""

model = sadire.SADIRE(alpha=1, beta=3)


# SADIRE returns the representative indices
samples = model.fit_transform(embedding)

Example

See SADIRE on the MNIST dataset on top.

Support

Please, if you have any questions feel free to contact me at wilson_jr@outlook.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sadire-0.1.5.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

sadire-0.1.5-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file sadire-0.1.5.tar.gz.

File metadata

  • Download URL: sadire-0.1.5.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.6.9

File hashes

Hashes for sadire-0.1.5.tar.gz
Algorithm Hash digest
SHA256 c8439fba2d7dc2a35efaa855171489ffc14aef6d9ec00af6c431aa8a7ae093b4
MD5 0123f264ec95f940e35253b53fce3541
BLAKE2b-256 f606174c9a651aa62d3581131f8314cb85a4f55e68d567d2a82605fda3456f3c

See more details on using hashes here.

File details

Details for the file sadire-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: sadire-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.6.9

File hashes

Hashes for sadire-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 c08896eaef6a4b9a7f0dcb03b440806bffc6b872073aedfe0535ad6d280e6c5c
MD5 78ca6b88d12d02ed6484d6cb8b536ecd
BLAKE2b-256 22dba22e759cff48f8381e4c7b67c911fe788648536dedb43633a223ce742817

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page