Tools to generate concise high-quality summaries of a probability distribution

Project description

GoodPoints

A Python package for generating concise, high-quality summaries of a probability distribution

GoodPoints is a collection of tools for compressing a distribution more effectively than independent sampling:

Given an initial summary of n input points, kernel thinning returns s << n output points with comparable integration error across a reproducing kernel Hilbert space
Compress++ reduces the runtime of generic thinning algorithms with minimal loss in accuracy
Compress Then Test accelerates kernel two-sample testing using high-fidelity compression
Stein Kernel Thinning, Stein Recombination, and Stein Cholesky deliver unweighted, simplex-weighted, and constant-preserving coresets that reduce biases due to off-target sampling, tempering, or burn-in

Installation

To install the goodpoints package, use the following pip command:

pip install goodpoints

For Debiased Distribution Compression, JAX and additional dependencies are required. See goodpoints/jax for more details.

Getting started

The primary kernel thinning function is thin in the kt module:

from goodpoints import kt
coreset = kt.thin(X, m, split_kernel, swap_kernel, delta=0.5, seed=None, store_K=True, 
                  meanK=None, unique=False, verbose=False)
    """Returns kernel thinning coreset of size floor(n/2^m) as row indices into X
    
    Args:
      X: Input sequence of sample points with shape (n, d)
      m: Number of halving rounds
      split_kernel: Kernel function used by KT-SPLIT (often a square-root kernel, krt,
        or a target kernel, k); split_kernel(y,X) returns array of kernel evaluations
        between y and each row of X
      swap_kernel: Kernel function used by KT-SWAP (typically the target kernel, k);
        swap_kernel(y,X) returns array of kernel evaluations between y and each row of X
      delta: Run KT-SPLIT with constant failure probabilities delta_i = delta/n
      seed: Random seed to set prior to generation; if None, no seed will be set
      store_K: If False, runs O(nd) space version which does not store kernel
        matrix; if True, stores n x n kernel matrix
      meanK: None or array of length n with meanK[ii] = mean of swap_kernel(X[ii], X);
        used to speed up computation when not None
      unique: If True, constrains the output to never contain the same row index more than once
      verbose: If False, do not print intermediate time taken in thinning rounds, 
        if True print that info
    """

[!TIP] Choosing store_K=True is equivalent to running the faster but more memory-intensive kt.thin_K Cython implementation, while store_K=False uses the slower Python implementation kt.thin_X. See examples/kt/run_kt_experiment.ipynb for example uses.

A more efficient JAX implementation with GPU and CPU support is also available in goodpoints.jax.kt.kt.

The primary function for speeding up kernel thinning with Compress++ is the compresspp_kt function in the compress module:

from goodpoints import compress
coreset = compress.compresspp_kt(X, kernel_type, g=g, mean0=mean0)
    """Returns KT-Compress++(g) coreset of size sqrt(n) as row indices into X

    Args: 
      X: Input sequence of sample points with shape (n, d)
      kernel_type: Byte string name of kernel to use
      g: Oversampling parameter, a nonnegative integer
      mean0: If False, final KT call minimizes MMD to empirical measure over 
        the input points. Otherwise minimizes MMD to the 0 measure; this
        is useful when the kernel has expectation zero under a target measure.
    """

[!TIP] A more efficient JAX implementation with GPU and CPU support is also available in goodpoints.jax.compress.kt_compresspp.

See also Thinformer for a PyTorch implementation of Compress++ applied to kernel halving with GPU and CPU support.

To speed up a generic halving algorithm with Compress++, we also provide the compresspp function in the compress module:

from goodpoints import compress
coreset = compress.compresspp(X, halve, thin, g)
    """Returns Compress++(g) coreset of size sqrt(n) as row indices into X

    Args: 
        X: Input sequence of sample points with shape (n, d)
        halve: Function that takes in an (n', d) numpy array Y and returns 
          floor(n'/2) distinct row indices into Y, identifying a halved coreset
        thin: Function that takes in an (2^g sqrt(n'), d) numpy array Y and returns
          sqrt(n') row indices into Y, identifying a thinned coreset
        g: Oversampling factor
    """

[!TIP] See examples/compress/construct_compresspp_coresets.py for example uses.

A more efficient JAX implementation with GPU and CPU support is also available in goodpoints.jax.compress.compresspp.

The primary Compress Then Test function is ctt in the ctt module:

from goodpoints import ctt
test_results = ctt.ctt(X1, X2, g)
    """Compress Then Test two-sample test with sample sequences X1 and X2
    and auxiliary kernel k' = target kernel k.

    Args:
      X1: 2D array of size (n1,d)
      X2: 2D array of size (n2,d)
      g: compression level; integer >= 0
      B: number of permutations (int)
      s: total number of compression bins will be num_bins = min(2*s, n1+n2)
      lam: positive kernel bandwidth 
      kernel: kernel name; valid options include "gauss" for Gaussian kernel
        exp(-||x-y||^2/lam^2)
      null_seed: seed used to initialize random number generator for
        randomness in simulating null
      statistic_seed: seed used to initialize random number generator for
        randomness in computing test statistic
      alpha: test level
      delta: KT-Compress failure probability
      
    Returns: TestResults object.
    """

[!TIP] See examples/mmd_test/test.py for example uses.

The primary Debiased Distribution Compression functions are in the jax.dtc module, including:

Stein Kernel Thinning (skt) and Low-rank Stein Kernel Thinning (lskt) for equal-weighted coresets
Stein Recombination (sr) and Low-rank Stein Recombination (lsr) for simplex-weighted coresets
Stein Cholesky (sc) and Low-rank Stein Cholesky (lsc) for constant-preserving coresets

Moreover, Debiased Distribution Compression provides JAX implementation of Kernel Thinning, Compress++, and Stein Thinning that can be of independent interest. Please refer to goodpoints/jax/README.md for all available functions.

Examples

Code in the examples directory uses the goodpoints package to recreate the experiments of the following research papers.

Kernel Thinning

@article{dwivedi2024kernel,
  title={Kernel Thinning},
  author={Dwivedi, Raaz and Mackey, Lester},
  journal={Journal of Machine Learning Research},
  volume={25},
  number={152},
  pages={1--77},
  year={2024}
}

The script examples/kt/submit_jobs_run_kt.py reproduces the vignette experiments of Kernel Thinning on a Slurm cluster by executing examples/kt/run_kt_experiment.ipynb with appropriate parameters. For the MCMC examples, it assumes that necessary data was downloaded and pre-processed following the steps listed in examples/kt/preprocess_mcmc_data.ipynb, where in the last code block we report the median heuristic based bandwidth parameteters (along with the code to compute it).
After all results have been generated, the notebook examples/kt/plot_results.ipynb can be used to reproduce the figures of Kernel Thinning.

Generalized Kernel Thinning

@inproceedings{dwivedi2022generalized,
  title={Generalized Kernel Thinning},
  author={Raaz Dwivedi and Lester Mackey},
  booktitle={International Conference on Learning Representations},
  year={2022}
}

The script examples/gkt/submit_gkt_jobs.py reproduces the vignette experiments of Generalized Kernel Thinning on a Slurm cluster by executing examples/gkt/run_generalized_kt_experiment.ipynb with appropriate parameters. For the MCMC examples, it assumes that necessary data was downloaded and pre-processed following the steps listed in examples/kt/preprocess_mcmc_data.ipynb.
Once the coresets are generated, examples/gkt/compute_test_function_errors.ipynb can be used to generate integration errors for different test functions.
After all results have been generated, the notebook examples/gkt/plot_gkt_results.ipynb can be used to reproduce the figures of Generalized Kernel Thinning.

Distribution Compression in Near-linear Time

@inproceedings{shetty2022distribution,
  title={Distribution Compression in Near-linear Time},
  author={Abhishek Shetty and Raaz Dwivedi and Lester Mackey},
  booktitle={International Conference on Learning Representations},
  year={2022}
}

The notebook examples/compress/script_to_deploy_jobs.ipynb reproduces the experiments of Distribution Compression in Near-linear Time in the following manner:
1. It generates various coresets and computes their mmds by executing examples/compress/construct_{THIN}_coresets.py for THIN in {compresspp, kt, st, herding} with appropriate parameters, where the flag kt stands for kernel thinning, st stands for standard thinning (choosing every t-th point), and herding refers to kernel herding.
2. It computes the runtimes of different algorithms by executing examples/compress/run_time.py.
3. For the MCMC examples, it assumes that necessary data was downloaded and pre-processed following the steps listed in examples/kt/preprocess_mcmc_data.ipynb.
4. The notebook currently deploys these jobs on a slurm cluster, but setting deploy_slurm = False in examples/compress/script_to_deploy_jobs.ipynb will submit the jobs as independent python calls on terminal.
After all results have been generated, the notebook examples/compress/plot_compress_results.ipynb can be used to reproduce the figures of Distribution Compression in Near-linear Time.

Compress Then Test: Powerful Kernel Testing in Near-linear Time

@inproceedings{domingoenrich2023compress,
  title={Compress Then Test: Powerful Kernel Testing in Near-linear Time},
  author={Carles Domingo-Enrich and Raaz Dwivedi and Lester Mackey},
  booktitle={Proceedings of The 26th International Conference on Artificial Intelligence and Statistics},
  year={2023},
  series={Proceedings of Machine Learning Research},
  month={25--27 Apr},
  publisher={PMLR}
}

See examples/mmd_test/README.md.

Debiased Distribution Compression

@InProceedings{li2024debiased,
    title = {Debiased Distribution Compression},
    author = {Li, Lingxiao and Dwivedi, Raaz and Mackey, Lester},
    booktitle = {Proceedings of the 41st International Conference on Machine Learning},
    year = {2024},
    volume = {203},
    series = {Proceedings of Machine Learning Research},
    month = {21--27 Jul},
    publisher = {PMLR},
}

See examples/debias/README.md.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Project details

Release history Release notifications | RSS feed

0.4.2

Mar 2, 2026

This version

0.4.1

Jan 20, 2026

0.4.0

Nov 6, 2025

0.3.9

Oct 21, 2025

0.3.8

Oct 17, 2025

0.3.7

Oct 10, 2025

0.3.6

Oct 9, 2025

0.3.5

Sep 4, 2025

0.3.4

Mar 25, 2025

0.3.3

Mar 25, 2025

0.3.2

Dec 9, 2024

0.3.0

May 23, 2024

0.2.5

Oct 19, 2023

0.2.4

Oct 19, 2023

0.2.3

Sep 1, 2023

0.2.2

Apr 12, 2023

0.2.1

Feb 14, 2023

0.2.0

Jan 21, 2023

0.1.1

Apr 5, 2022

0.1.0

Apr 5, 2022

0.0.9

Feb 22, 2022

0.0.8

Dec 2, 2021

0.0.7

Dec 2, 2021

0.0.6

Nov 17, 2021

0.0.5

Nov 17, 2021

0.0.4

Nov 17, 2021

0.0.3

Nov 17, 2021

0.0.2

Nov 13, 2021

0.0.1

Nov 10, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

goodpoints-0.4.1.tar.gz (933.2 kB view details)

Uploaded Jan 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

goodpoints-0.4.1-cp313-cp313-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded Jan 20, 2026 CPython 3.13macOS 11.0+ ARM64

File details

Details for the file goodpoints-0.4.1.tar.gz.

File metadata

Download URL: goodpoints-0.4.1.tar.gz
Upload date: Jan 20, 2026
Size: 933.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for goodpoints-0.4.1.tar.gz
Algorithm	Hash digest
SHA256	`4fd6efba9ed50515ed385dd654eec4b593df1cd5ee97950cef8af3f613df1ad3`
MD5	`86673e509dbcb09b08053f3fd8f35f50`
BLAKE2b-256	`e47f6155bbdb5a073e0c86f15cfe4dd507e9577a2135218b9e5beaf48aa798e8`

See more details on using hashes here.

File details

Details for the file goodpoints-0.4.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

Download URL: goodpoints-0.4.1-cp313-cp313-macosx_11_0_arm64.whl
Upload date: Jan 20, 2026
Size: 1.4 MB
Tags: CPython 3.13, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for goodpoints-0.4.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`8325f11bddbae605d636dc550afc4f63f4ea3aaccbaef03de31db114eac6a8c1`
MD5	`69b01287e3c75337d767bd0dd1d7af61`
BLAKE2b-256	`18437abb206e3eaea29a1195dc98817ceff5941451103249f57031e45a67b4ea`

See more details on using hashes here.

goodpoints 0.4.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Project description

GoodPoints

A Python package for generating concise, high-quality summaries of a probability distribution

Installation

Getting started

Examples

Kernel Thinning

Generalized Kernel Thinning

Distribution Compression in Near-linear Time

Compress Then Test: Powerful Kernel Testing in Near-linear Time

Debiased Distribution Compression

Contributing

Trademarks

Project details

Verified details

Maintainers

Unverified details

Project links

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes