A package for fitting principal curves in Python
Project description
princurve
pip install princurve
Inspired by this R package, princurve brings principal curves to Python.
What princurve does
princurve has local and global algorithms for computing principal curves.
What is a Principal Curve?
A principal curve is a smooth n-dimensional curve that passes through the middle of a dataset. Principal curves are a dimensionality reduction tool analogous to a nonlinear principal component. PCs have uses in GPS data, image recognition, bioinformatics, and so much more.
Local Algorithms
Local algorithms work on a step-by-step basis. Starting at one end of the curve, it will attempt to make segments that meet an acceptable error threshold as it moves from one end of the curve to the other. Once the algorithm can connect the current point to the end point, the algorithm terminates and a curve is interpolated through the segments. PrinPy currently has two local algorithms:
- CLPC-g (Greedy Constraint Local Principal Curve)1
- CLPC-s (One-Dimensional Search Constraint Local Principal Curve)1
CLPC-g will be faster and is fine for simpler curves. CLPS-s has the potential to be much more accurate at the expense of speed for more difficult curves. After fitting a curve, prinPy has the ability to project to the curve.
Global Algorithms
Global algorithms, unlike local algorithms, are more like minimization problems. Given a dataset, a global algorithm might make an initial guess at a principal curve and adjust it from there.
The sole global algorithm as of now performs nonlinear principal component analysis. The global algorithm, called NLPCA in this package, is a neural network implementation.2 This algorithm works by creating an autoassociative neural network with a "bottle-neck" layer which forces the network to learn the most important features of the data.
Which one should I use?
The local algorithms will be better for tightly bunched data, such as digit recogniition or GPS data. The global algorithm is better suited for "clouds" of data or sparsely represented data.
Quick-Start
View the quickstart notebook here. Docs will be coming soon!
# Example of local PC fitting
cl = CLPCG() # Create solver
# CLPCG.fit() fits the principal curve. takes x_data, y_data,
# and the min allowed error for each step. e_min is acheived
# through trial and error, but 1/4 to 1/2 data error is what authors
# recommend.
cl.fit(xdata, ydata, e_max = .1)
cl.plot() # plots curve, optional axes can be passed
# Reconstruct curve
tcks = cl.spline_ticks # get spline ticks
xy = scipy.interpolate.splev(np.linspace(0,1,100), self.spline_ticks)
References
[1] Dewang Chen, Jiateng Yin, Shiying Yang, Lingxi Li, Peter Pudney, Constraint local principal curve: Concept, algorithms and applications, Journal of Computational and Applied Mathematics, Volume 298, 2016, Pages 222-235, ISSN 0377-0427, https://doi.org/10.1016/j.cam.2015.11.041.
[2] Mark Kramer, Nonlinear Principal Component Analysis Using Autoassociative Neural Networks
[3] Hastie, T. and Stuetzle, W., Principal Curves, JASA, Vol. 84, No. 406 (Jun., 1989), pp. 502-516, DOI: 10.2307/2289936
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file princurve-0.0.1.tar.gz
.
File metadata
- Download URL: princurve-0.0.1.tar.gz
- Upload date:
- Size: 8.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 921b1043b560cf3c8d14c32588c21c2637e980662b314b25687c271d417f29d9 |
|
MD5 | 44bcc2d791136d59ca6c8df161a7565d |
|
BLAKE2b-256 | 6c8bc9e32bd98538f4310b013f579b30e23b159a414843d16632ce999f3a60e5 |
File details
Details for the file princurve-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: princurve-0.0.1-py3-none-any.whl
- Upload date:
- Size: 8.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1af0d3684e7a896e9de8665e304582fe353ccc8b63dc08d6ee138c6bf15ea528 |
|
MD5 | b2e3d48761f08377b804a9f24aa1c927 |
|
BLAKE2b-256 | 97eebcad340aa3e084a5822ec7017ec0b68627388d10c3ad89ff8ca5700a2577 |