Skip to main content

SubModLib is an easy-to-use, efficient and scalable Python library for submodular optimization with a C++ optimization engine. Submodlib finds its application in summarization, data subset selection, hyper parameter tuning, efficient training etc. Through a rich API, it offers a great deal of flexibility in the way it can be used.

Project description


            

About SubModLib

SubModLib is an easy-to-use, efficient and scalable Python library for submodular optimization with a C++ optimization engine. Submodlib finds its application in summarization, data subset selection, hyper parameter tuning, efficient training etc. Through a rich API, it offers a great deal of flexibility in the way it can be used.

Please check out our latest arxiv preprint: https://arxiv.org/abs/2202.10680

Salient Features

  • Rich suite of functions for a wide variety of subset selection tasks:
    • regular set (submodular) functions
    • submodular mutual information functions
    • conditional gain functions
    • conditional mutual information functions
  • Supports different types of optimizers
    • naive greedy
    • lazy (accelerated) greedy
    • stochastic (random) greedy
    • lazier than lazy greedy
  • Combines the best of Python's ease of use and C++'s efficiency
  • Rich API which gives a variety of options to the user. See this notebook for an example of different usage patterns
  • De-coupled function and optimizer paradigm makes it suitable for a wide-variety of tasks
  • Comprehensive documentation (available here)

Google Colab Notebooks Demonstrating the power of SubModLib and sample usage

Setup

Alternative 1

  • $ pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ submodlib

Alternative 2 (if local docs need to be built and test cases need to be run)

  • $ git clone https://github.com/decile-team/submodlib.git
  • $ cd submodlib
  • $ pip install .
  • Latest documentation is available at readthedocs. However, if local documentation is required to be built, follow these steps::
    • $ pip install -U sphinx
    • $ pip install sphinxcontrib-bibtex
    • $ pip install sphinx-rtd-theme
    • $ cd docs
    • $ make clean html
  • To run the tests, follow these steps:
    • $ pip install pytest
    • $ pytest # this runs ALL tests
    • $ pytest -m <marker> --verbose --disable-warnings -rA # this runs test specified by the . Possible markers are mentioned in pyproject.toml file.

Usage

It is very easy to get started with submodlib. Using a submodular function in submodlib essentially boils down to just two steps:

  1. instantiate the corresponding function object
  2. invoke the desired method on the created object

The most frequently used methods are:

  1. f.evaluate() - takes a subset and returns the score of the subset as computed by the function f
  2. f.marginalGain() - takes a subset and an element and returns the marginal gain of adding the element to the subset, as computed by f
  3. f.maximize() - takes a budget and an optimizer to return an optimal set as a result of maximizing f

For example,

from submodlib import FacilityLocationFunction
objFL = FacilityLocationFunction(n=43, data=groundData, mode="dense", metric="euclidean")
greedyList = objFL.maximize(budget=10,optimizer='NaiveGreedy')

For a more detailed discussion on all possible usage patterns, please see Different Options of Usage

Functions

Modelling Capabilities of Different Functions

We demonstrate the representational power and modeling capabilities of different functions qualitatively in the following Google Colab notebooks:

This notebook contains a quantitative analysis of performance of different functions and role of the parameterization in aspects like query-coverage, query-relevance, privacy-irrelevance and diversity for different SMI, CG and CMI functions as observed on synthetically generated dataset. This notebook contains similar analysis on ImageNette dataset.

Optimizers

Sample Application (Image collection summarization)

  • This notebook contains demonstration of using submodlib for an image collection summarization application.

Timing Analysis

To gauge the performance of submodlib, selection by Facility Location was performed on a randomly generated dataset of 1024-dimensional points. Specifically the following code was run for the number of data points ranging from 50 to 10000.

K_dense = helper.create_kernel(dataArray, mode="dense", metric='euclidean', method="other")
obj = FacilityLocationFunction(n=num_samples, mode="dense", sijs=K_dense, separate_rep=False,pybind_mode="array")
obj.maximize(budget=budget,optimizer=optimizer, stopIfZeroGain=False, stopIfNegativeGain=False, verbose=False, show_progress=False)

The above code was timed using Python's timeit module averaged across three executions each. We report the following numbers:

Number of data points Time taken (in seconds)
50 0.00043
100 0.001074
200 0.003024
500 0.016555
1000 0.081773
5000 2.469303
6000 3.563144
7000 4.667065
8000 6.174047
9000 8.010674
10000 9.417298

Citing

If your research makes use of SUBMODLIB, please consider citing:

SUBMODLIB (Submodlib: A Submodular Optimization Library (Kaushal et al., 2022))

@article{kaushal2022submodlib,
  title={Submodlib: A submodular optimization library},
  author={Kaushal, Vishal and Ramakrishnan, Ganesh and Iyer, Rishabh},
  journal={arXiv preprint arXiv:2202.10680},
  year={2022}
}

Contributors

  • Vishal Kaushal, Ganesh Ramakrishnan and Rishabh Iyer. Currently maintained by CARAML Lab

Contact

Should you face any issues or have any feedback or suggestions, please feel free to contact vishal[dot]kaushal[at]gmail.com

Acknowledgements

This work is supported by the Ekal Fellowship (www.ekal.org). This work is also supported by the National Science Foundation(NSF) under Grant Number 2106937, a startup grant from UT Dallas, as well as Google and Adobe awards.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

submodlib_py-0.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (601.9 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

submodlib_py-0.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (610.1 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

submodlib_py-0.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (605.9 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

submodlib_py-0.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (609.1 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

submodlib_py-0.0.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (605.9 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file submodlib_py-0.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for submodlib_py-0.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fee07755f9004d59db236128d55252cb9955bde70b1c34f390b153451dfc419e
MD5 abd07c3bd499e92c87939b07b1dbc45c
BLAKE2b-256 a42642ce13d87e110c37dcb011a68b424c4202000373a08987d8879b6a2731a4

See more details on using hashes here.

File details

Details for the file submodlib_py-0.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for submodlib_py-0.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c4f90c860b57687b982ab1eb5172bec47ca3dbfbcd96d90d69b473ac266c725a
MD5 82d98a6731d07be9625a24de091e15b9
BLAKE2b-256 ef2dd8c22dea469e3b799057ea7c2bbcf6ff5359f9e9a5fe2b67827a210ee214

See more details on using hashes here.

File details

Details for the file submodlib_py-0.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for submodlib_py-0.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 87c5cd9e5fcde745d8755872bf5f9831595d70ac131c2b60100184e39c8c7c02
MD5 508a1b69f684e372ecc4895e8898fcdc
BLAKE2b-256 6b67d2e0ff5bbc0b60694bcad99c502ceb16e31516ea7c853d6744f95582e2ca

See more details on using hashes here.

File details

Details for the file submodlib_py-0.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for submodlib_py-0.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bdccc2554a11e1ffd8173534dd0697e1bb27d37be027469c27d6cf511cdc874d
MD5 300423411fe608049147a3ce3a5b1c3f
BLAKE2b-256 34fc741ecd312cb80a4a783a7eb75405dc43946ac3cc3e90893cd75a117c0c90

See more details on using hashes here.

File details

Details for the file submodlib_py-0.0.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for submodlib_py-0.0.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c507bf5169375706c602567e4ec5c4b8a7f0b576d4c5fde027ccb32ccce25347
MD5 b7b3228b55fb7fe37c6314315702393a
BLAKE2b-256 47fcdc7fd5f64042a914889f32b56f18b36ee65ab6c5a8868f527e65005f3696

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page