Distance-based Analysis of DAta-manifolds in python
Project description
DADApy is a Python package for the characterisation of manifolds in high dimensional spaces.
Homepage
For more details and tutorials, visit the homepage at: https://dadapy.readthedocs.io/
Quick Example
import numpy as np
from dadapy.data import Data
# Generate a simple 3D gaussian dataset
X = np.random.normal(0, 1, (1000, 3))
# initialise the "Data" class with the set of coordinates
data = Data(X)
# compute distances up to the 100th nearest neighbour
data.compute_distances(maxk=100)
# compute the intrinsic dimension using 2nn estimator
data.compute_id_2NN()
# compute the density using PAk, a point adaptive kNN estimator
data.compute_density_PAk()
# find the peaks of the density profile through the ADP algorithm
data.compute_clustering_ADP()
Currently implemented algorithms
-
Intrinsic dimension estimators
-
Two-NN estimator
Facco et al., Scientific Reports (2017)
-
Gride estimator
Denti et al., Scientific Reports (2022)
-
Density estimators
-
kNN estimator
-
k*NN estimator (kNN with adaptive choice of k)
-
PAk estimator
Rodriguez et al., JCTC (2018)
-
Density peaks clustering methods
-
Density peaks clustering
Rodriguez and Laio, Science (2014)
-
Advanced density peaks clustering
d’Errico et al., Information Sciences (2021)
-
k-peak clustering
Sormani, Rodriguez and Laio, JCTC (2020)
-
Manifold comparison tools
-
Neighbourhood overlap
Doimo et al., NeurIPS (2020)
-
Information imbalance
Glielmo et al., PNAS Nexus (2022)
Installation
The package is compatible with Python >= 3.7 (tested on 3.7, 3.8 and 3.9). We currently only support Unix-based systems, including Linux and macOS. For Windows-machines we suggest using the Windows Subsystem for Linux (WSL).
The package requires numpy
, scipy
and scikit-learn
, and matplotlib
for the visualisations.
The package contains Cython-generated C extensions that are automatically compiled during install.
The latest release is available through pip
pip install dadapy
To install the latest development version, clone the source code from github and install it with pip as follows
git clone https://github.com/sissa-data-science/DADApy.git
cd DADApy
pip install .
Citing DADApy
A description of the package is available here.
Please consider citing it if you found this package useful for your research
@article{dadapy,
title = {DADApy: Distance-based analysis of data-manifolds in Python},
journal = {Patterns},
pages = {100589},
year = {2022},
issn = {2666-3899},
doi = {https://doi.org/10.1016/j.patter.2022.100589},
url = {https://www.sciencedirect.com/science/article/pii/S2666389922002070},
author = {Aldo Glielmo and Iuri Macocco and Diego Doimo and Matteo Carli and Claudio Zeni and Romina Wild and Maria d’Errico and Alex Rodriguez and Alessandro Laio},
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dadapy-0.2.0.tar.gz
.
File metadata
- Download URL: dadapy-0.2.0.tar.gz
- Upload date:
- Size: 502.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 125abed7dd1717edb105bc0ed74e8001cba95132f8e5739aa3979a4c7ea41d41 |
|
MD5 | 75eb9aa35fa5ecd291fb8cdad40ed081 |
|
BLAKE2b-256 | af1683ec4bd52384b9513170429146940ac69cb358a0815fd967976c0f5ceb66 |
File details
Details for the file dadapy-0.2.0-cp38-cp38-macosx_11_0_arm64.whl
.
File metadata
- Download URL: dadapy-0.2.0-cp38-cp38-macosx_11_0_arm64.whl
- Upload date:
- Size: 755.2 kB
- Tags: CPython 3.8, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cd5a3af114918a07d6529375650478e292d35c4375a004043342e0173dc39204 |
|
MD5 | bdab3d2fcc89cfef3a6db578e7f44c05 |
|
BLAKE2b-256 | f0a92ea5dfc348e219c0fb7501d76269e19e72b4a23724539bbf067877fde15b |