A Python Package for Density Ratio Estimation
Project description
A Python Package for Density Ratio Estimation
Koji Makiyama (@hoxo-m), Ameya Daigavane (@ameya98), and Krzysztof Mierzejewski (@mierzejk)
1. Overview
Density ratio estimation is described as follows: for given two data
samples x1 and x2 from unknown distributions p(x) and q(x)
respectively, estimate w(x) = p(x) / q(x), where x1 and x2 are
d-dimensional real numbers.
The estimated density ratio function w(x) can be used in many
applications such as the inlier-based outlier detection [1] and
covariate shift adaptation [2]. Other useful applications for density
ratio estimation were summarized by Sugiyama et al. (2012) in [3].
The package densratio provides densratio() and method-specific
wrappers uLSIF(), RuLSIF(), and KLIEP(). Each estimator returns an
object with compute_density_ratio() for evaluating the learned density
ratio on new samples. The default method is uLSIF, matching the R
package API.
Further, the alpha-relative density ratio
p(x)/(alpha * p(x) + (1 - alpha) * q(x)) (where alpha is in the range
[0, 1]) can also be estimated. When alpha is 0, this reduces to the
ordinary density ratio w(x). The alpha-relative PE-divergence and
KL-divergence between p(x) and q(x) are also computed.
For example,
import numpy as np
from scipy.stats import norm
from densratio import densratio
np.random.seed(1)
x = norm.rvs(size=500, loc=0, scale=1./8)
y = norm.rvs(size=500, loc=0, scale=1./2)
alpha = 0.1
densratio_obj = densratio(x, y, method="RuLSIF", alpha=alpha)
print(densratio_obj)
gives the following output:
#> Method: RuLSIF
#>
#> Alpha: 0.1
#>
#> Kernel Information:
#> Kernel type: Gaussian
#> Number of kernels: 100
#> Bandwidth(sigma): 0.1
#> Centers: array([[-0.09591373],..
#>
#> Kernel Weights (theta):
#> array([0.04990797, 0.0550548 , 0.04784736, 0.04951904, 0.04840418,..
#>
#> Regularization Parameter (lambda): 0.1
#>
#> Alpha-Relative PE-Divergence: 0.618794133598705
#>
#> Alpha-Relative KL-Divergence: 0.7037648129307483
#>
#> Function to Estimate Density Ratio:
#> compute_density_ratio(x)
#>
In this case, the true density ratio w(x) is known, so we can compare
w(x) with the estimated density ratio w-hat(x). The code below gives
the plot shown above.
from matplotlib import pyplot as plt
from numpy import linspace
def true_alpha_density_ratio(sample):
return norm.pdf(sample, 0, 1./8) / (alpha * norm.pdf(sample, 0, 1./8) + (1 - alpha) * norm.pdf(sample, 0, 1./2))
def estimated_alpha_density_ratio(sample):
return densratio_obj.compute_density_ratio(sample)
sample_points = np.linspace(-1, 3, 400)
plt.plot(sample_points, true_alpha_density_ratio(sample_points), 'b-', label='True Alpha-Relative Density Ratio')
plt.plot(sample_points, estimated_alpha_density_ratio(sample_points), 'r-', label='Estimated Alpha-Relative Density Ratio')
plt.title("Alpha-Relative Density Ratio - Normal Random Variables (alpha={:03.2f})".format(alpha))
plt.legend()
plt.show()
2. Installation
You can install the package from PyPI.
$ pip install densratio
densratio supports Python 3.10 or later.
Also, you can install the package from GitHub.
$ pip install git+https://github.com/hoxo-m/densratio_py.git
The source code for densratio package is available on GitHub at https://github.com/hoxo-m/densratio_py.
3. Details
3.1. Basics
The package provides densratio(). The function returns an object that
has a function to compute estimated density ratio.
For data samples x and y,
from scipy.stats import norm
from densratio import densratio
x = norm.rvs(size = 200, loc = 1, scale = 1./8)
y = norm.rvs(size = 200, loc = 1, scale = 1./2)
result = densratio(x, y)
In this case, result.compute_density_ratio() can compute estimated
density ratio.
from matplotlib import pyplot as plt
density_ratio = result.compute_density_ratio(y)
plt.plot(y, density_ratio, "o")
plt.xlabel("x")
plt.ylabel("Density Ratio")
plt.show()
3.2. Methods
The package estimates density ratios with Gaussian-kernel direct density
ratio estimators. Use the method argument of densratio() or call the
method-specific wrappers directly:
uLSIF(x, y, ...)estimates the ordinary density ratiop(x) / q(x)by unconstrained Least-Squares Importance Fitting. This is the default fordensratio(x, y).RuLSIF(x, y, alpha=0.1, ...)estimates the alpha-relative density ratiop(x) / (alpha * p(x) + (1 - alpha) * q(x)). It also reports alpha-relative PE-divergence and KL-divergence.KLIEP(x, y, fold=5, ...)estimates the ordinary density ratio by Kullback-Leibler Importance Estimation Procedure. It uses cross-validation oversigmawhen a search range is provided.
For example:
ordinary = densratio(x, y)
relative = densratio(x, y, method="RuLSIF", alpha=0.1)
kliep = densratio(x, y, method="KLIEP", sigma=[0.1, 0.3, 1.0], fold=5)
All methods represent the density ratio with a linear Gaussian RBF kernel model:
w(x) = theta1 * K(x, c1) + theta2 * K(x, c2) + ... + thetab * K(x, cb)
where K(x, c) = exp(- ||x - c||^2 / (2 * sigma ^ 2)) is the Gaussian
RBF kernel.
densratio() performs the following:
- Decides kernel parameter
sigmaby cross-validation. - Optimizes for kernel weights
theta. - For RuLSIF, computes the alpha-relative PE-divergence and KL-divergence from the learned alpha-relative ratio.
Kernel centers are selected at random from x, the numerator sample.
Set numpy.random.seed(...) before fitting when reproducible centers are
needed.
As the result, you can obtain compute_density_ratio(), which will
compute the estimated density ratio at the passed coordinates.
3.3. Result and Parameter Settings
densratio() outputs the result like as follows:
#> Method: uLSIF
#>
#> Alpha: 0
#>
#> Kernel Information:
#> Kernel type: Gaussian
#> Number of kernels: 100
#> Bandwidth(sigma): 0.1
#> Centers: array([[0.92113356],..
#>
#> Kernel Weights (theta):
#> array([0.08848922, 0.03377533, 0.0753727 , 0.06141277, 0.02543963,..
#>
#> Regularization Parameter (lambda): 1.0
#>
#> Alpha-Relative PE-Divergence: 0.9635169300831041
#>
#> Alpha-Relative KL-Divergence: 0.838826626547327
#>
#> Function to Estimate Density Ratio:
#> compute_density_ratio(x)
#>
- Method is
uLSIF,RuLSIF, orKLIEP. - Kernel type is fixed as Gaussian RBF.
- Number of kernels is the number of kernels in the linear model.
You can change by setting
kernel_numparameter. In default,kernel_num = 100. - Bandwidth(sigma) is the Gaussian kernel bandwidth. In default,
sigma = "auto", the algorithm automatically select an optimal value by cross validation. If you setsigmaa number, that will be used. If you setsigmaa numeric array, the algorithm select an optimal value in them by cross validation. - Centers are centers of Gaussian kernels in the linear model.
These are selected at random from the data sample
xunderlying a numerator distributionp(x). You can find the whole values inresult.kernel_info.centers. - Kernel weights(theta) are theta parameters in the linear kernel
model. You can find these values in
result.theta, orresult.kernel_weightsfor R-style naming. - Regularization parameter(lambda) is used by
uLSIFandRuLSIF. It is not used byKLIEP. - Fold is used by
KLIEPcross-validation. - The function to estimate the density ratio is named
compute_density_ratio().
3.4. Setting Gaussian kernel calculation engine
When working out Gaussian kernels, linear algebra calculations can be done either with numpy or numba packages. The densratio.set_compute_kernel_target function accepts a single str argument to globally select a specified engine:
numpy- numpy broadcasting optimized. It must be noted the underlying BLAS library (e.g. Intel's MKL) can take advantage of multi threading model.cpu- numba generalized universal function single thread optimized.parallel- numba generalized universal function multi thread optimized. Please be advised all threading layer specifics apply.
densratio defaults to cpu when numba is available, or numpy otherwise.
Although numba is not a requirement of densratio_py, version 0.45.1 or later is necessary to set the calculation engine to cpu or parallel.
4. Multi Dimensional Data Samples
So far, we have deal with one-dimensional data samples x and y.
densratio() allows to input multidimensional data samples as
numpy.ndarray or numpy.matrix, as long as their dimensions are the
same.
For example,
from scipy.stats import multivariate_normal
from densratio import densratio
np.random.seed(1)
x = multivariate_normal.rvs(size=3000, mean=[1, 1], cov=[[1. / 8, 0], [0, 1. / 8]])
y = multivariate_normal.rvs(size=3000, mean=[1, 1], cov=[[1. / 2, 0], [0, 1. / 2]])
alpha = 0
densratio_obj = densratio(x, y, method="RuLSIF", alpha=alpha, sigma_range=[0.1, 0.3, 0.5, 0.7, 1], lambda_range=[0.01, 0.02, 0.03, 0.04, 0.05])
print(densratio_obj)
gives the following output:
#> Method: RuLSIF
#>
#> Alpha: 0
#>
#> Kernel Information:
#> Kernel type: Gaussian
#> Number of kernels: 100
#> Bandwidth(sigma): 0.3
#> Centers: array([[1.01477443, 1.38864061],..
#>
#> Kernel Weights (theta):
#> array([0.06151164, 0.08012094, 0.10467369, 0.13868176, 0.14917063,..
#>
#> Regularization Parameter (lambda): 0.04
#>
#> Alpha-Relative PE-Divergence: 0.653615870855595
#>
#> Alpha-Relative KL-Divergence: 0.6214285743087565
#>
#> Function to Estimate Density Ratio:
#> compute_density_ratio(x)
#>
In this case, as well, we can compare the true density ratio with the estimated density ratio.
from matplotlib import pyplot as plt
from numpy import linspace, dstack, meshgrid, concatenate
def true_alpha_density_ratio(x):
return multivariate_normal.pdf(x, [1., 1.], [[1. / 8, 0], [0, 1. / 8]]) / \
(alpha * multivariate_normal.pdf(x, [1., 1.], [[1. / 8, 0], [0, 1. / 8]]) + (1 - alpha) * multivariate_normal.pdf(x, [1., 1.], [[1. / 2, 0], [0, 1. / 2]]))
def estimated_alpha_density_ratio(x):
return densratio_obj.compute_density_ratio(x)
range_ = np.linspace(0, 2, 200)
grid = np.concatenate(np.dstack(np.meshgrid(range_, range_)))
levels = [0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4.5]
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.contourf(range_, range_, true_alpha_density_ratio(grid).reshape(200, 200), levels)
#> <matplotlib.contour.QuadContourSet object at 0x0000022E950202E0>
plt.colorbar()
#> <matplotlib.colorbar.Colorbar object at 0x0000022E9500DA80>
plt.title("True Alpha-Relative Density Ratio")
plt.subplot(1, 2, 2)
plt.contourf(range_, range_, estimated_alpha_density_ratio(grid).reshape(200, 200), levels)
#> <matplotlib.contour.QuadContourSet object at 0x0000022E942C8EE0>
plt.colorbar()
#> <matplotlib.colorbar.Colorbar object at 0x0000022E95095150>
plt.title("Estimated Alpha-Relative Density Ratio")
plt.show()
5. Used in research
The densratio package has been used in several research papers, including:
- Kato, M., Imaizumi, M., & Minami, K. (2023). Unified Perspective on Probability Divergence via the Density-Ratio Likelihood: Bridging KL-Divergence and Integral Probability Metrics. AISTATS 2023.
- Nagumo, R., & Fujisawa, H. (2024). Density Ratio Estimation with Doubly Strong Robustness. ICML 2024.
- Endo, H., Ikeda, S., Harada, K., Yamagata, H., Matsubara, T., Matsuo, K., Kawahara, Y., & Yamashita, O. (2024). Manifold alteration between major depressive disorder and healthy control subjects using dynamic mode decomposition in resting-state fMRI data. Frontiers in Psychiatry, 2024.
- Wang, M., Huang, W., Gong, M., & Zhang, Z. (2025). Projection Pursuit Density Ratio Estimation. ICML 2025.
6. Related Work
- densratio for R https://github.com/hoxo-m/densratio
- pykliep https://github.com/srome/pykliep
References
[1] Hido, S., Tsuboi, Y., Kashima, H., Sugiyama, M., & Kanamori, T. Statistical outlier detection using direct density ratio estimation. Knowledge and Information Systems 2011.
[2] Sugiyama, M., Nakajima, S., Kashima, H., von Bunau, P. & Kawanabe, M. Direct importance estimation with model selection and its application to covariate shift adaptation. NIPS 2007.
[3] Sugiyama, M., Suzuki, T. & Kanamori, T. Density Ratio Estimation in Machine Learning. Cambridge University Press 2012.
[4] Liu, S., Yamada, M., Collier, N., & Sugiyama, M. Change-Point Detection in Time-Series Data by Relative Density-Ratio Estimation Neural Networks, 2013.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file densratio-0.4.0.tar.gz.
File metadata
- Download URL: densratio-0.4.0.tar.gz
- Upload date:
- Size: 21.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f00eb20167180bcfd2f71c81a3d0851aab5f73d8de9b504565b42c13f8f54a9
|
|
| MD5 |
77055640ecdfb51d84a3f336d2d3f78e
|
|
| BLAKE2b-256 |
f656ee39b72910d4182cc18a10457264e08991f9340e6715d8fe6cb377efcc78
|
Provenance
The following attestation bundles were made for densratio-0.4.0.tar.gz:
Publisher:
release.yml on hoxo-m/densratio_py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
densratio-0.4.0.tar.gz -
Subject digest:
0f00eb20167180bcfd2f71c81a3d0851aab5f73d8de9b504565b42c13f8f54a9 - Sigstore transparency entry: 1436998175
- Sigstore integration time:
-
Permalink:
hoxo-m/densratio_py@49465800c30002ea292cbccd742f73aaa1b80012 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/hoxo-m
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@49465800c30002ea292cbccd742f73aaa1b80012 -
Trigger Event:
release
-
Statement type:
File details
Details for the file densratio-0.4.0-py3-none-any.whl.
File metadata
- Download URL: densratio-0.4.0-py3-none-any.whl
- Upload date:
- Size: 15.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97d7da2ea65f951435e3d5355066138a51823367061792dd82fe4847827bf72f
|
|
| MD5 |
68ce3d74cc3dbed0027efca6cdfae0f0
|
|
| BLAKE2b-256 |
ffd67989797018012e1ce838b21a23860697890cca061c12482831a2f1bbd367
|
Provenance
The following attestation bundles were made for densratio-0.4.0-py3-none-any.whl:
Publisher:
release.yml on hoxo-m/densratio_py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
densratio-0.4.0-py3-none-any.whl -
Subject digest:
97d7da2ea65f951435e3d5355066138a51823367061792dd82fe4847827bf72f - Sigstore transparency entry: 1436998179
- Sigstore integration time:
-
Permalink:
hoxo-m/densratio_py@49465800c30002ea292cbccd742f73aaa1b80012 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/hoxo-m
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@49465800c30002ea292cbccd742f73aaa1b80012 -
Trigger Event:
release
-
Statement type: