Skip to main content

Python binding for Ripser++.

Project description

Ripser++

PyPI license Open In Collab

Copyright © 2019, 2020, 2021 Simon Zhang, Mengbai Xiao, Hao Wang

Maintainer: Simon Zhang

Contributors: (by order of introduction to the project) Birkan Gokbag, Ryan DeMilt

Ripser++ [3] is built on top of the Ripser [1] software written by Ulrich Bauer and utilizes both GPU and CPU (via separation of parallelisms [4]) to accelerate the computation of Vietoris-Rips persistence barcodes.

Description

Ripser++ utilizes the massive parallelism hidden in the computation of Vietoris-Rips persistence barcodes by taking mathematical and algorithmic oppurtunities we have identified. It can achieve up to 30x speedup over the total execution time of Ripser, up to 2.0x CPU memory efficiency and and up to 1.58x reduction in the amount of memory used on GPU compared to that on CPU for Ripser.

After dimension 0 persistence computation, there are two stages of computation in the original Ripser: filtration construction with clearing followed by matrix reduction. Ripser++ massively parallelizes the filtration construction with clearing stage and extracts the hidden parallelism of finding "apparent pairs" from matrix reduction all on GPU, leaving the computation of submatrix reduction on the remaining nonapparent columns on CPU. By our empirical findings, up to 99.9% of the columns in a cleared coboundary matrix are apparent.

Installation Requirements

Dependencies:

  1. a 64 bit Operating System

  2. a. Linux

    OR b. Windows

  3. CMake >=3.10, (e.g. CMake 3.10.2)

  4. CUDA >=10.1, (e.g. CUDA 10.1.243)

  5. a. GCC >=7.5, (e.g. GCC 8.4.0 for Linux)

    OR b. MSVC 192x (e.g. MSVC 1928 for Visual Studio 2019 v16.9.2 for Windows)

Note: for compilation on Windows, it is best if Cygwin is uninstalled

Ripser++ is intended to run on high performance computing systems.

Thus, a GPU with enough device memory is needed to run large datasets. (e.g. Tesla V100 GPU with 32GB device DRAM). If the system's GPU is not compatible, or the system does not have a GPU, error messages will appear.

Any of the GPUs provided by Google Colab should work.

You do not have to have a super computer, however. On my $900 dollar laptop with a 6GB device memory NVIDIA GPU, I was able to run the sphere_3_192 dataset to dimension 3 computation with a 15x speedup over Ripser.

It is also preferable to have a multicore processor (e.g. >= 28 cores) for high performance, and a large amount of DRAM is required for large datasets. We have tested on a 100 GB DRAM single computing node with 28 cores.

Installing Python Bindings (preferred)

The purpose of the Python Bindings is to allow users to write their own Python scripts to run Ripser++. The user can write Python preprocessing code on the inputs of Ripser++. This can eliminate file I/O and allow for automated calling of Ripser++.

Contributors: Ryan DeMilt, Birkan Gokbag, Simon Zhang

Requirements:

(Requirements from Installation Requirements Section)

Linux, (or Windows), CMake >=3.10, CUDA >=10.1, GCC >=7.5 (Linux) or Microsoft Visual Studio 2019 (Windows)

Python Requirements:

Python 3.x, NumPy, SciPy

(As of January 2020, Python 2.x has been sunset)

Installation

For the version on PyPI:

pip3 install ripserplusplus

For the latest release of ripser++:

pip3 install git+https://github.com/simonzhang00/ripser-plusplus.git

or in the ripser-plusplus/ directory (local installation):

git clone https://github.com/simonzhang00/ripser-plusplus.git
pip3 install .
cd ripserplusplus

Notice after local installation you need to go to a different directory than ripser-plusplus/ due to path searching in the __init__.py file.

Note If for some reason the wheel does not install, compilation can take >=2 minutes on Windows due to a workaround and >=1 minute on Linux so be patient!

Note You need all of the software and hardware requirements listed in the installation requirements section.

The ripserplusplus Python API

ripserplusplus package API:

  • Function to Access Ripser++:
        run(arguments_list, matrix or file_name)
    
    • First Argument:
      • arguments_list: Contains the command line options to be entered into Ripser++ as a string. e.g. "--format lower-distance --dim 2"
    • Second Argument: Could be either of the following but not both
      • matrix: Must be a numpy array
        • e.g. [3,2,1] is a lower-distance matrix of 3 points
        • e.g. [[0,3,2],[3,0,1],[2,1,0]] is a distance matrix of 3 points
      • or sparse matrix: A scipy coo format matrix
        • e.g. mtx = sps.coo_matrix([[0, 5, 0, 0, 0, 0],[5, 0, 0, 7, 0, 12],[0, 0, 0, 0, 0, 0],[0, 7, 0, 0, 22, 0],[0, 0, 0, 22, 0, 0],[0, 12, 0 ,0, 0, 0]])
      • or file_name: Must be of type string.
        • e.g. "../../examples/sphere_3_192.distance_matrix.lower_triangular"
    • Output: a Python dictionary of numpy arrays of persistence pairs; the dictionary is indexed by the dimension of the array of persistence pairs.

Options of Ripser++ for Python bindings:


Options:

  --help           print this screen
  --format         use the specified file format for the input. Options are:
                     lower-distance (lower triangular distance matrix; default)
                     distance       (full distance matrix)
                     point-cloud    (point cloud in Euclidean space)
                     sparse         (sparse distance matrix in sparse triplet (COO) format)
  --dim <k>        compute persistent homology up to dimension <k>
  --threshold <t>  compute Rips complexes up to diameter <t>
  --sparse         force sparse computation
  --ratio <r>      only show persistence pairs with death/birth ratio > r

How to use Ripser++ with Python Bindings?

Check out the following gist of Ripser++ running on Google Colab.

After having installed the Python bindings successfully (see the Installation section), first checkout the sample code in ripserplusplus/python_examples/ such as examples.py.

To create your own Python script to run Ripser++. Create a Python file (e.g. myExample.py) under ripser-plusplus/python/working_directory/. At the top of your Python script:

Import the ripserplusplus package to access Ripser++ computing engine:

import ripserplusplus as rpp_py

Also import numpy, if you want to input a User Matrix:

import numpy as np

In your Python script, call run(arguments_list, matrix or file_name) with the following usages:

Read from File

Python bindings work with file name inputs similar to ripser++ executable. Examples are located under ripser-plusplus/python/working_directory/examples.py.

from the ripser-plusplus/ripserplusplus/ directory: e.g. rpp_py.run("--format point-cloud --sparse --dim 2 --threshold 1.4", "examples/o3_4096.point_cloud")

User Matrix Formats

Note: default user matrix format is distance in Ripser++. If you know your matrix format is different, then you must use the --format option

distance matrix:

  • Only supports matrix with the following constraints:
    • Has only 0s at diagonals
    • Symmetric
    • Lower Triangular matrix adhears to the same constraints as lower-distance matrix

e.g. rpp_py.run("--format distance", np.array([[0,3,2],[3,0,1],[2,1,0]]))

runs Ripser++ on a 3 point finite metric space.

lower-distance matrix:

  • Only supports vectors, as either a row or column vector
  • Must be the same size as a square matrix's linearized lower triangular matrix

e.g. rpp_py.run("--format lower-distance",np.array([3,2,1]))

runs Ripser++ on the same data as the distance matrix given above.

point-cloud:

  • Supports a 2-d numpy array where the number of rows are the number of points embedded in d-dimensional euclidan space and the number of columns is d
  • Assumes the Euclidean distance between points

e.g. rpp_py.run("--format point-cloud",np.array([[3,2,1],[1,2,3]]))

runs Ripser++ on a 2 point point cloud in 3 dimensional Euclidean space.

sparse (COO):

e.g. import scipy.sparse as sps; mtx = sps.coo_matrix([[0, 5, 0, 0, 0, 0],[5, 0, 0, 7, 0, 12],[0, 0, 0, 0, 0, 0],[0, 7, 0, 0, 22, 0],[0, 0, 0, 22, 0, 0],[0, 12, 0 ,0, 0, 0]]); rpp_py.run("--format sparse", mtx)

Running Python scripts

To run your Python scripts, run, for example, python3 myExample.py or python3 examples.py in the working_directory. This runs Ripser++ through python. A Python dictionary is the output of the run function. Python 2 is no longer supported, please use python3 when running your scripts.

for usage, see the file ripserplusplus/python_examples.py

How do the Python Bindings Work?

setup.py will build shared object files with CMake: libpyripser++.so and libphmap.so from ripser++.cu. libpyripser++.so is loaded through the ctypes foreign function library of Python. Ripser++ is accessed with the API of one function called run(-,-) to be called by your own custom Python script.

Citing:

@misc{2003.07989,
Author = {Simon Zhang, Mengbai Xiao, and Hao Wang},
Title = {GPU-Accelerated Computation of Vietoris-Rips Persistence Barcodes},
Year = {2020},
Eprint = {arXiv:2003.07989},
}

References:

  1. Bauer, Ulrich. "Ripser: efficient computation of Vietoris-Rips persistence barcodes." arXiv preprint arXiv:1908.02518 (2019).
  2. Otter, Nina, et al. "A roadmap for the computation of persistent homology." EPJ Data Science 6.1 (2017): 17.
  3. Zhang, Simon, et al. "GPU-Accelerated Computation of Vietoris-Rips Persistence Barcodes." Proceedings of the Symposium on Computational Geometry. (SoCG 2020)
  4. Zhang, Simon, et al. "HYPHA: a framework based on separation of parallelisms to accelerate persistent homology matrix reduction." Proceedings of the ACM International Conference on Supercomputing. ACM, 2019.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ripserplusplus-1.1.2.tar.gz (5.8 MB view hashes)

Uploaded Source

Built Distributions

ripserplusplus-1.1.2-cp37-cp37m-win_amd64.whl (444.4 kB view hashes)

Uploaded CPython 3.7m Windows x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page