Skip to main content

No project description provided

Project description

persuit

Persuit is an experimental persistent homology (PH) library, which aims to speed up your PH pipeline by reducing the boundary matrix whilst it is being constructed. The library is written in Rust and Python bindings are provided via PyO3.

Warning Persuit is currently in alpha. The implementation of the standard algorithm is far from optimised, the API is not fixed and no tests have been written. Use at your own risk.

Motivation

Here is the rough outline of many PH pipelines:

  1. Read in data.
  2. Build up a basis for the columns of your boundary matrix.
  3. Sort this basis by the filtration time and dimension.
  4. Compute a sparse representation of the boundary matrix.
  5. Pass the sparse matrix to a PH library (e.g. Eirene or PHAT) to reduce the boundary matrix and hence compute pairings.

In many (although not all) pipelines, steps 1-4 takes significantly longer than step 5. Indeed, often step 4 alone can take longer than step 5. However, step 5 often still takes a significant time to complete. In such circumstances, Persuit might offer a speed up.

Consider the "standard algorithm" for persistence computation:

for each column j:
    while ∃i < j with low(i) = low(j):
        add col[i] to col[j]
    endwhile
endfor

Note that in order to reduce column j we only need the previous columns. Therefore, we can start reducing j as soon as it is available! That is, we can compute PH while we compute the sparse boundary matrix and potentially hide some of the cost of the standard algorithm.

Main Idea

The two core functions provided by Persuit are std_persuit and std_persuit_collected. Let's focus on std_persuit.

pub fn std_persuit<C, T>(col_iterator: T) -> StandardAlgo<C, flume::IntoIter<(usize, C)>>
where
    T: Iterator<Item = C> + Send + 'static,
    C: Column + 'static,
{
    let (tx, rx) = flume::unbounded();
    thread::spawn(move || {
        let mut enumerated_cols = col_iterator.enumerate();
        while let Some(enum_col) = enumerated_cols.next() {
            tx.send(enum_col).unwrap();
        }
    });
    StandardAlgo::new(rx.into_iter())
}

Note that the output of this function is an iterator over the pairings computed by the standard algorithm. Despite all of the annotations, the signature of std_persuit is essentially

iterator over columns -> iterator over parings

The way this is achieved is:

  1. Spawn a new thread and setup a channel for communication.
  2. The new thread consumes columns from the input iterator and puts them on the channel.
  3. The main thread consumes columns off the channel and performs the standard algorithm on these columns, unaware they are coming from a different thread.

Since the new thread is constantly consuming the iterator, we construct the sparse matrix at the usual rate. Meanwhile, the main thread is reducing columns of this sparse matrix as soon as it can.

Currently, due to the overhead of inter-thread communication this is usually slower than the serial algorithm.

Usage

The main Python bindings are std_persuit and std_persuit_serial. These functions take one argument: an iterator over sparse columns. Currently, persuit only supports ℤ₂ homology and hence your matrix should also have enties in ℤ₂. The ith iterate of the provided iterator should be an increasing list of indices of the non-zero rows in the ith column of your boundary matrix. See test.py for an illustrative example.

TODO

  • Try out FnvHasMap, IndexMap and BTreeMap for storing low_inverse
  • Try out linked lists and bitvec as Column types
  • Return (Lex-optimal) representatives
  • Python bindings
  • Implement other PH algorithms, e.g. clear and compress?
  • Implement parallel over dimensions?
  • Write a basic README
  • Find example where this actually does speed up PH (need PH computation to be a significant portion of the time)
  • Write tests (unit, integration + property)
  • Benchmark

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

persuit-0.1.0.tar.gz (17.1 MB view hashes)

Uploaded Source

Built Distribution

persuit-0.1.0-cp310-cp310-manylinux_2_34_x86_64.whl (230.8 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.34+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page