Skip to main content

Estimate the optimal number of components or clusters.

Project description

Module of functions for estimating the optimal number of components or clusters.

PCA

Selects the number of components based on comparing eigenvectors between split-halves of the data. I.e., this doesn't use the shape of the eigenvalue curve, but makes a split between components with high versus low split-half similarity.

Usage:

O = teg_get_best_n.get_n_components(X)

This returns a dictionary with the estimated number of components in O['nComponents'], as well as the eigenvalues (O['eigenvalues']) and eigenvectors (O['eigenvectors']).

The file example.py contains tests with simulated data to check how well the true number of latent variables is recovered.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

teg_get_best_n-0.0.1.tar.gz (3.1 kB view hashes)

Uploaded Source

Built Distribution

teg_get_best_n-0.0.1-py3-none-any.whl (3.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page