Skip to main content

Optimally partitioning data into piece-wise linear segments.

Project description

Nunchaku: Optimally partitioning data into piece-wise segments

nunchaku is a statistically rigorous, Bayesian algorithm to infer the optimal partitioning of a data set into contiguous piece-wise segments.

Who might find this useful?

Scientists and engineers who wish to detect change points within a dataset, at which the dependency of one variable on the other change.

For example, if y's underlying function is a piece-wise linear function of x, nunchaku will find the points at which the gradient and the intercept change.

What does it do?

Given a dataset with two variables (e.g. a 1D time series), it infers the piece-wise function that best approximates the dataset. The function can be a piece-wise constant function, a piece-wise linear function, or a piece-wise function described by linear combinations of arbitrary basis functions (e.g. polynomials, sines).

For piece-wise linear functions, it provides statistics for each segment, from which users select the segment(s) of most interest, for example, the one with the largest gradient or the one with the largest $R^2$.

For details about how it works, please refer to our paper, freely available on Bioinformatics.

Installation

To install via PyPI, type in Terminal (for Linux/Mac OS users) or Anaconda Prompt (for Windows users with Anaconda installed):

> pip install nunchaku

For developers, create a virtual environment, install poetry and then install nunchaku with Poetry:

> git clone https://git.ecdf.ed.ac.uk/s1856140/nunchaku.git
> cd nunchaku 
> poetry install --with dev 

Quick start

Data x is a list or a 1D Numpy array, sorted ascendingly; the data y is a list or a 1D Numpy array, or a 2D Numpy array with each row being one replicate of the measurement. Below is a script to analyse the built-in example data.

>>> from nunchaku import Nunchaku, get_example_data
>>> x, y = get_example_data()
>>> # load data and set the prior of the gradient
>>> nc = Nunchaku(x, y, prior=[-5, 5]) 
>>> # compare models with 1, 2, 3 and 4 linear segments
>>> numseg, evidences = nc.get_number(num_range=(1, 4))
>>> # get the mean and standard deviation of the boundary points
>>> bds, bds_std = nc.get_iboundaries(numseg)
>>> # get the information of all segments
>>> info_df = nc.get_info(bds)
>>> # plot the data and the segments
>>> nc.plot(info_df)
>>> # get the underlying piece-wise function (for piece-wise linear functions only)
>>> y_prediction = nc.predict(info_df)

More detailed examples are provided in a Jupyter Notebook in our repository.

Documentation

Detailed documentation is available on Readthedocs.

Development history

  • v0.16.1: Replace lambdas with module-level named functions to ensure compatibility with concurrent.futures.
  • v0.16.0: Added support for Python 3.12 and NumPy 2.0; introduced a quiet option; re-implemented key computations in log space to prevent overflow and improve numerical stability.
  • v0.15.4: Improved handling of bad initial guess for EM algorithm. The last release to support Python 3.8 and NumPy 1.x.
  • v0.15.2: Dependency fix release.
  • v0.15.1: Bug fix release.
  • v0.15.0: supports detection of piece-wise functions described by a linear combination of arbitrary basis; supports Python 3.11.
  • v0.14.0: supports detection of linear segments.

Similar packages

  • The NOT package written in R.
  • The beast package written in R.

Citation

If you find this useful, please cite our paper:

Huo, Y., Li, H., Wang, X., Du, X., & Swain, P. S. (2023). Nunchaku: Optimally partitioning data into piece-wise linear segments. Bioinformatics. https://doi.org/10.1093/bioinformatics/btad688

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nunchaku-0.16.1.tar.gz (16.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nunchaku-0.16.1-py3-none-any.whl (17.8 kB view details)

Uploaded Python 3

File details

Details for the file nunchaku-0.16.1.tar.gz.

File metadata

  • Download URL: nunchaku-0.16.1.tar.gz
  • Upload date:
  • Size: 16.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.12.9 Linux/6.13.6-100.fc40.x86_64

File hashes

Hashes for nunchaku-0.16.1.tar.gz
Algorithm Hash digest
SHA256 b59a59768baae97649143757a4943e71f70431aef2a1b6c551df9e24090a5b56
MD5 c06d5b8cbde80ba7a4d66b2f6e3a0579
BLAKE2b-256 48da5d9ffee84b7858763a58dfd44d347e85913ec55c822c6188388deb6d3b3a

See more details on using hashes here.

File details

Details for the file nunchaku-0.16.1-py3-none-any.whl.

File metadata

  • Download URL: nunchaku-0.16.1-py3-none-any.whl
  • Upload date:
  • Size: 17.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.12.9 Linux/6.13.6-100.fc40.x86_64

File hashes

Hashes for nunchaku-0.16.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a8fd5bf9bf9adcc59871d5cd8f902265f3a55f900ac41caf1bd2941120e7e8ab
MD5 8e4894d18dec4b5aeff62eb3b9e58504
BLAKE2b-256 3176c044cfe343d0e19d513f16cbea91fd46bce378bc50a14ff30fe81715d60a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page