Nearest Neighbor Density Estimation
Project description
Nearest Neighbor Density Estimation (NNDensity)
The package implements six nearest neighbor based density estimation method and provides efficient tools for density estimation research. See paper/paper.md for more descriptions and details in methodology and literature.
Contents
Installation
Since NNDensity is based on Cython, installation requires c/c++ compiler. Users can check by
gcc -v
g++ -v
to see their version. For Linux, users can install gcc/g++ by apt. For macOS, refer to Xcode. For Windows, refer to Microsoft c++ building tools.
Via PypI
pip install NNDensity
Via GitHub
pip install git+https://github.com/Karlmyh/NNDensity.git
Mannul Install
git clone git@github.com:Karlmyh/NNDensity.git
cd NNDensity
python setup.py install
Basic Usage
Data Generation
Density generation tools. Below is a show case using a mixture distribution.
from NNDensity import MultivariateNormalDistribution, MixedDistribution, ExponentialDistribution
# setup
dim=2
density1 = ExponentialDistribution(lamda = np.ones(dim)*0.5)
density2 = MultivariateNormalDistribution(mean = np.zeros(dim)-1.5, cov = np.diag(np.ones(dim)*0.3))
density_seq = [density1, density2]
prob_seq = [0.4, 0.6]
densitymix = MixedDistribution(density_seq, prob_seq)
# generate 10 samples and return their pdf
samples, samples_pdf = densitymix.generate(10)
samples
# evaluate pdf at given samples
densitymix.density(samples)
# compare with true pdf
(samples_pdf == samples).all()
Out[1]: array([[-2.23087816, -1.08521314],
[-1.03424594, -1.24327987],
[-2.02698363, -1.63201056],
[ 1.43021832, 1.51448518],
[ 1.58820377, 1.8541296 ],
[-0.88802267, -2.398429 ],
[-1.26067249, -2.12988644],
[-1.92476226, -2.0167295 ],
[-2.0035588 , -1.35662414],
[-1.46406062, -1.9693262 ]])
Out[2]: True
Density Estimation
Adopt AWNN model to estimate the density.
###### using AWNN to estimate density
from NNDensity import AWNN
# generate samples
X_train, pdf_X_train =densitymix.generate(1000)
X_test, pdf_X_test =densitymix.generate(1000)
# choose parameter C=0.1
model_AWNN=AWNN(C=.1).fit(X_train)
# output is log scaled
est_AWNN=np.exp(model_AWNN.predict(X_test))
# compute the mean absolute error
np.abs(est_AWNN-pdf_X_test).mean()
Out[3]: 0.09148487940943466
Automatically select parameter using GridSearchCV to improve result.
from sklearn.model_selection import GridSearchCV
# generate samples
X_train, pdf_X_train =densitymix.generate(1000)
X_test, pdf_X_test =densitymix.generate(1000)
# select parameter grid
parameters={"k":[int(i*1000) for i in [0.01,0.02,0.05,0.1,0.2,0.5]]}
# use all available cpu, use 10 fold cross validation
cv_model_KNN=GridSearchCV(estimator=KNN(),param_grid=parameters,n_jobs=-1,cv=10)
_=cv_model_KNN.fit(X_train)
model_KNN=cv_model_KNN.best_estimator_
# output is log scaled
est_KNN=np.exp(model_KNN.predict(X_test))
# compute the mean absolute error
np.abs(est_KNN-pdf_X_test).mean()
Out[4]: 0.055937476261628344
Visualization
Frequently used visualization plots for density estimation research.
###### 3d prediction surface using WKNN
from NNDensity import contour3d
# generate samples
dim=2
density1 = MultivariateNormalDistribution(mean = np.zeros(dim)+1.5, cov = np.diag(np.ones(dim)*0.4))
density2 = MultivariateNormalDistribution(mean = np.zeros(dim)-1.5, cov = np.diag(np.ones(dim)*0.7))
density_seq = [density1, density2]
prob_seq = [0.4, 0.6]
densitymix = MixedDistribution(density_seq, prob_seq)
X_train, pdf_X_train =densitymix.generate(1000)
model_plot=contour3d(X_train,method="WKNN",k=100)
model_plot.estimation()
fig=model_plot.make_plot()
###### 2d prediction contour using BKNN
from NNDensity import contour2d
from sklearn.model_selection import GridSearchCV
# generate samples
X_train, pdf_X_train =densitymix.generate(1000)
model_plot=contour2d(X_train,method="BKNN",C=10)
model_plot.estimation()
fig=model_plot.make_plot()
###### prediction curve plot
# generate samples
X_train, pdf_X_train =densitymix.generate(1000)
kargs_seq= [{"k":100},{"k":100},{"k":100} ]
model_plot=lineplot(X_train,method_seq=["KNN", "WKNN", "TKNN"],true_density_obj=densitymix,kargs_seq=kargs_seq)
fig=model_plot.plot()
kargs_seq= [{"C":0.9},{"C":1},{"C":1} ]
model_plot=lineplot(X_train,method_seq=["AKNN", "BKNN", "AWNN"],true_density_obj=densitymix,kargs_seq=kargs_seq)
fig=model_plot.plot()
Reference
NNDensity utilizes tools from numpy, matplotlib, scipy, jupyter notebooks, scikit-learn, cython and numba. Also, large part of KD tree implementation was borrowed from scikit-learn. For specific citations, see papers/paper.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file NNTRY-0.0.3.tar.gz
.
File metadata
- Download URL: NNTRY-0.0.3.tar.gz
- Upload date:
- Size: 214.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1e63bfb67d012daa31e38b92f87c0f8e1e32b23f9fa17361ed249018b656c8ce |
|
MD5 | 3bbd6232330df751bfd17c43db304bf4 |
|
BLAKE2b-256 | 1b879f431dae3a5035abda244eada0fea23475fb4a65289e189353c3eb7a0e13 |