Skip to main content

A Python package for characterizing unstructured data with the nearest neighbor permutation entropy

Project description

PyPI GitHub PyPI - Downloads Documentation Status

knnpe: A Python package implementing the k-nearest neighbor permutation entropy

The k-nearest neighbor permutation entropy [1] extends the fundamental premise of investigating the relative ordering of time series elements [2] or image pixels [3] inaugurated by permutation entropy to unstructured datasets. This method builds upon nearest neighbor graphs to establish neighborhood relations among data points and uses random walks over these graphs to extract ordinal patterns and their distribution, thereby defining the $k$-nearest neighbor permutation entropy.

If you have used knnpe in a scientific publication, we would appreciate citations to the following reference:

@article{voltarelli2024characterizing,
 title         = {Characterizing unstructured data with the nearest neighbor permutation entropy},
 author        = {L. G. J. M. Voltarelli, A. A. B. Pessa, L. Zunino, R. S. Zola, E. K. Lenzi, M. Perc, H. V. Ribeiro},
 journal       = {},
 volume        = {},
 number        = {},
 pages         = {},
 year          = {2024},
 eprint        = {2403.13122},
 archivePrefix = {arXiv},
 doi           = {},
}

Installing

knnpe uses OpenMP and GNU Scientific Library (GSL) to implement a parallelized numerically efficient random walk function. This function is written in C and it is integrated with our Python module via the ctypes library. To use this function, you must have OpenMP and GSL installed before installing knnpe.

In Ubuntu/Debian, you can install these dependencies via apt:

sudo apt install build-essential
sudo apt install libgsl-dev
sudo apt install libomp-dev

If these dependencies are not available, knnpe will use a native Python function to do the random walks. This function is also parallelized and may work nicely for most applications; still, it is significantly slower than its C counterpart. For large datasets, we strongly recommend using the C version.

If all dependencies are available, knnpe can be installed via:

pip install git+https://github.com/hvribeiro/knnpe

or

git clone https://github.com/hvribeiro/knnpe.git
cd knnpe
pip install -e .

If all dependencies are not available, you can use the PyPI version via:

pip install knnpe

Basic usage

Implementation of the $k$-nearest neighbor permutation entropy. (A) Illustration of a dataset with irregularly distributed data points $\{z_i\}_{i=1,\dots,N}$ in the $xy$-plane where each coordinate pair $(x_i,y_i)$ is associated with a value $z_i$. (B) Initially, we construct a $k$-nearest neighbor graph using the data coordinates to define neighborhood relationships. In this graph, each data point $z_i$ represents a node, with undirected edges connecting pairs $i\leftrightarrow j$ when $j$ is among the $k$-nearest neighbors of $i$ ($k=3$ in this example). (C) Subsequently, we execute $n$ biased random walks of length $w$ starting from each node, sampling the data points to generate time series ($n=2$ and $w=6$ in this example). We then apply the Bandt-Pompe approach to each of these time series. This involves creating overlapping partitions of length $d$ (embedding dimension) and arranging the partition indices in ascending order of their values to determine the sorting permutations for each partition ($d=3$ in this example). (D) Finally, we evaluate the probability of each of the $d!$ possible permutations (ordinal distribution) and calculate its Shannon entropy, thereby defining the $k$-nearest neighbor permutation entropy.

https://raw.githubusercontent.com/hvribeiro/knnpe/main/examples/figs/figmethod.png

The function knn_permutation_entropy of knnpe calculates $k$-nearest neighbor permutation entropy as illustrated below for a random dataset with three columns.

import numpy as np
from knnpe import knn_permutation_entropy

data = np.random.normal(size=(100,3))
knn_permutation_entropy(data)

The last column in data corresponds to $\{z_i\}_{i=1,\dots,N}$ values, while the first two columns are used as the data coordinates $\vec{r}_i = (x_i,y_i)$. If the dataset has more dimensions in data coordinates, they must be passed as the first columns of the dataset, and the last column is always assumed to correspond to $z_i$ values. The code below illustrates the case of data with three dimensions in data coordinates:

import numpy as np
from knnpe import knn_permutation_entropy

data = np.random.normal(size=(100,4))
knn_permutation_entropy(data)

The function knn_permutation_entropy has the following parameters:

datandarray

Input array containing unstructured data points, where each row is in the form [x, y, value].

dint, optional

The embedding dimension for the entropy calculation (default is 3).

tauint, optional

The embedding delay for the entropy calculation (default is 1).

pfloat, optional

Parameter that controls the bias of immediately revisiting a node in the walk (default is 10). It is named \({\\lambda}\) in the article.

qfloat, optional

Parameter that controls the bias of moving outside the neighborhood of the previous node (default is 0.001). It is named \({\\beta}\) in the article.

random_walk_stepsint, optional

The number of steps in each random walk (default is 10).

num_walksint, optional

The number of random walk samples to start from each node (default is 10).

n_neighborsint or array-like, optional

The number of neighbors for constructing the k-nearest neighbor graph (default is 25).

nthreadsint, optional

The number of parallel threads for the computation (default is -1, which uses all available cores).

hide_barbool, optional

If True, the progress bar is not displayed (default is False).

metricsbool, optional

If True, calculates graph density and largest component fraction (default is False).

complexitybool, optional

If True, also calculates the knn permutation complexity.

We provide a notebook illustrating how to use knnpe and further information we refer to the knnpe’s documentation

Contributing

Pull requests addressing errors or adding new functionalities are always welcome.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

knnpe-0.1.3.tar.gz (11.4 kB view details)

Uploaded Source

Built Distribution

knnpe-0.1.3-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file knnpe-0.1.3.tar.gz.

File metadata

  • Download URL: knnpe-0.1.3.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for knnpe-0.1.3.tar.gz
Algorithm Hash digest
SHA256 61c1fefe3788e67a16f746f51fd612bb7e0622f71130337a3b6aa6237259cd93
MD5 eaccf02973df71e0b5c3ac607e2d714a
BLAKE2b-256 6f967d9730b74737b6917beb360173f5f77b6e227ea6613d047f1d22154e51b8

See more details on using hashes here.

File details

Details for the file knnpe-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: knnpe-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 12.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for knnpe-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 92b5c239ef256cb0bd7034caed455ec38da1b54715ed517ecc57adbc131c7abd
MD5 f83c8b4cdcdef4050c887936b92c358b
BLAKE2b-256 5226b10fc69af682d0d4684280d0242a298c59a846cdc67c2a8570f020bc834b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page