Skip to main content

Discrete Optimal Transport in 1D by Linear Programming

Project description

OT1D: Discrete Optimal Transport in 1D by Linear Programming

The OT1D library offers a simple but efficient implementation of an algorithm to compute the Kantorovich-Wasserstein distance between two empirical measures defined in dimension 1, that is, the support points of the measures are in R. We have designed the algorithm by directly exploiting the Complementary slackness conditions of Linear Programming. The implementation focuses more on efficiency than genericity, and we try to be as efficient as possible in several notable cases. We implemented the core algorithm in standard ANSI C++11, and we provide a python3 wrapper, which can be installed with:

pip3 install ot1d

The OT1D library provides an implementation of Optimal Transport in 1D that is faster than:

  1. Scipy: it is at least 6x faster than scipy.stats.wasserstein_distance, but it can be up to 11x faster
  2. POT: it is at least 2x faster than ot.lp.wasserstein_1d, but it can be up to 7x faster

The real speedup will depend on your computer platform (i.e., numebr of cores), your OS, and compiler. For running a performance test on your computer, see below, or run the python script OT1D_test. For some strange reason, the speed ups on Mac laptops are larger than for other architectures.

REMARK: If you find instances where OT1D is slower, please, let us know.

DotLIB

This tiny library is part of dotlib, a large project to develop Optimal Transport algorithms based on efficient Linear Programming implementations.

Basic Usage: Colab Notebook

The simplest way to test this library is to run the following notebook on Colab:

Data Notebook Link
[2021/06/21] Testing and evaluating OT1D Open In Colab

Usage

The main function of the OT1D library is the following:

z = OT1D(x, y, mu=None, nu=None, p=2, sorting=True, threads=8)

The parameters of the function are:

  • x: the support points of the first measure
  • y: the support points of the second measure
  • mu: the weights of the first measure. If equal to None, all the samples have the same mass.
  • nu: the weights of the second measure. If equal to None, all the samples have the same mass.
  • p: the order of the Wasserstein distance (p=1 or p=2)
  • sorting: if equal to True, the function sorts the support points given in input
  • threads: number of threads to use by the parallel sorting algorithm

The first four parameters can be given in input as python lists or numpy arrays.

In addition, we expose the following in-place parallel sorting function:

parasort(x, mu=None, threads=8)

The parameters of the function are:

  • x: the support points of a given measure
  • mu: the weights of the given measure. If equal toNone, only the support points are sorted
  • threads: number of threads to use by the parallel sorting algorithm

Details

Given two empirical distributions, the Kantorovich-Wasserstein distance is the given by optimal solution of a linear program, known as the transportation problem. While this is a general linear program, when the costs are defined among points belonging to the real line, the problem can be solved with an algorithm having worst-case time complexity of O(n log n). This can be shown by writing first the dual linear program, and then the slackness condition.

The key step of the algorithm is sorting of the two arrays of support points x and y. We sort the arrays by using a customized parallel sorting algorithm implemented in C++, which combines the very fast pdqsort with parasort. See the linked webpages for the license type of these two libraries.

Prerequisities for compilation

You want to compile the source code and the python wrapper, you only need the following two standard python libraries:

  • A C++ compiler compliant with the C++11 standard.
  • cython
  • numpy

You might need to install python-dev library, which on Linux can be installed by:

apt install python3-dev  # Ubuntu

Installation

To install OT1D you can run the following command:

pip3 install ot1d

Testing

For testing the library, you can run the following command:

python3 basic_test.py

The basic test snippet is the following:

import numpy as np
from OT1D import OT1D, parasort

np.random.seed(13)

N = 1000000

# Uniform samples
x = np.random.uniform(1, 2, N)
y = np.random.uniform(0, 1, N)

z = OT1D(x, y, p=2, sorting=True, threads=16)

print('Wasserstein distance of order 2, W2(x,y) =', z)

and the output should be similar to:

Wasserstein distance of order 2, W2(x,y) = 1.0002549459050794

Testing for Performance

These results can be reproduced running the following command (you need to have installed scipy and pot):

python3 OT1D_test.py

which output is should be similar to the following (but it depends on your platform):

--------------- TEST 3: Unsorted input (average runtime) --------------------
For OT1D using 8 threads

running test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Testing W1, samples of deltas, n=m
Scipy: average time = 0.214 speedup = 11.0
POT  : average time = 0.122 speedup = 6.3O
OT1D : average time = 0.019 speedup = 1.0

Testing W2, samples of deltas, n=m
POT  : average time = 0.12 speedup = 6.1
OT1D : average time = 0.02 speedup = 1.0

Testing W1, samples with weights
Scipy: average time = 0.225 speedup = 7.7
POT  : average time = 0.121 speedup = 4.2
OT1D : average time = 0.029 speedup = 1.0

Testing W2, samples with weights
POT  : average time = 0.119 speedup = 4.1
OT1D : average time = 0.029 speedup = 1.0


--------------- TEST 4: Sorted input (average runtime) --------------------
For OT1D using 8 threads

running test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Parallel sorting: time = 0.023 sec

Testing W1, samples of deltas, n=m
Scipy: average time = 0.07 speedup = 11.4
POT  : average time = 0.043 speedup = 7.1
OT1D : average time = 0.006 speedup = 1.0

Testing W2, samples of deltas, n=m
POT  : average time = 0.042 speedup = 7.0
OT1D : average time = 0.006 speedup = 1.0

Testing W1, samples with weights
Scipy: average time = 0.078 speedup = 5.9
POT  : average time = 0.042 speedup = 3.1
OT1D : average time = 0.013 speedup = 1.0

Testing W2, samples with weights
POT  : average time = 0.039 speedup = 3.0
OT1D : average time = 0.013 speedup = 1.0```

Please, contact us by email if you encounter any issues.

Author and maintainer

Stefano Gualandi, stefano.gualandi@gmail.com.

Maintainer: Stefano Gualandi stefano.gualandi@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

OT1D-0.3.6.tar.gz (123.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

OT1D-0.3.6-cp39-cp39-win_amd64.whl (82.0 kB view details)

Uploaded CPython 3.9Windows x86-64

OT1D-0.3.6-cp38-cp38-win_amd64.whl (82.0 kB view details)

Uploaded CPython 3.8Windows x86-64

OT1D-0.3.6-cp37-cp37m-win_amd64.whl (81.1 kB view details)

Uploaded CPython 3.7mWindows x86-64

OT1D-0.3.6-cp36-cp36m-win_amd64.whl (81.1 kB view details)

Uploaded CPython 3.6mWindows x86-64

File details

Details for the file OT1D-0.3.6.tar.gz.

File metadata

  • Download URL: OT1D-0.3.6.tar.gz
  • Upload date:
  • Size: 123.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.9.0

File hashes

Hashes for OT1D-0.3.6.tar.gz
Algorithm Hash digest
SHA256 035734dfafb145422f22a5a2c295c2d4a4488d1fcd251c27a997d645782ecb6f
MD5 b7da24a5618f995c3f1500e03e5a0427
BLAKE2b-256 ba2a1fb9bea161f409a7210a9ae62da61c0b3a1c0db00ff10adbdfbb0b52d3b9

See more details on using hashes here.

File details

Details for the file OT1D-0.3.6-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: OT1D-0.3.6-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 82.0 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.9.0

File hashes

Hashes for OT1D-0.3.6-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 745f5c02c7047c11abe81a3da7c361dbae848f066fa13f8a2ef0544192890e24
MD5 191f1f263217f4e2bac72fbc78941203
BLAKE2b-256 97c333942bf10aac4e2a2ffee5d66bd964574a9b604176bbfb877e11b330dabf

See more details on using hashes here.

File details

Details for the file OT1D-0.3.6-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: OT1D-0.3.6-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 82.0 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.9.0

File hashes

Hashes for OT1D-0.3.6-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 e8e1d4c013340a6f2c97c03cde1650d06d752db3d43b34574d7f58fb0492bbea
MD5 9efe3c3bde7db636272135c1b1ec54e9
BLAKE2b-256 0d4b9d2b0e11127d04766324bf051d1d86ecb61a3d3f50c536312240e42d2314

See more details on using hashes here.

File details

Details for the file OT1D-0.3.6-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: OT1D-0.3.6-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 81.1 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.9.0

File hashes

Hashes for OT1D-0.3.6-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 260d8814a53ad5d13e56579f0281ab97d009347575dd519a0d94432eadd69f1b
MD5 6e126f0848eeaf2d6f2d06fcfc8b734b
BLAKE2b-256 5397ee87f5821e7b760dd7e081015337d6a55ce988c15f977f89bc07e13d4bf4

See more details on using hashes here.

File details

Details for the file OT1D-0.3.6-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: OT1D-0.3.6-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 81.1 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.9.0

File hashes

Hashes for OT1D-0.3.6-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 9444c7aa4410a34de4c91d7e03de6c5af07449227dbfbfab5f607172c8fbf50e
MD5 2ca283d60b4c80e986915d1b5f4aed7f
BLAKE2b-256 9c4e8ef11a76ce52cf5669d90e886822f4a812d1b9e4d4f5c032ad586c2f1be0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page