Simulate high-dimensional multivariate data with arbitrary marginal distributions
Project description
bigsimr
bigsimr
is a Python3 package for simulating high-dimensional multivariate data with a target correlation and arbitrary marginal distributions via Gaussian copula. It utilizes Bigsimr.jl for its core routines. For full documentation and examples, please see the Bigsimr.jl docs.
Features
- Pearson matching - employs a matching algorithm (Xiao and Zhou 2019) to account for the non-linear transformation in the Normal-to-Anything (NORTA) step
- Spearman and Kendall matching - Use explicit transformations (Lebrun and Dutfoy 2009)
- Nearest Correlation Matrix - Calculate the nearest positive [semi]definite correlation matrix (Qi and Sun 2006)
- Fast Approximate Correlation Matrix - Calculate an approximation to the nearest positive definite correlation matrix
- Random Correlation Matrix - Generate random positive [semi]definite correlation matrices
- Fast Multivariate Normal Generation - Utilize multithreading to generate multivariate normal samples in parallel
Installation and Setup
Install the bigsimr
package from pip using
pip install git+https://github.com/SchisslerGroup/python-bigsimr.git
Or install the development version with
pip install git+https://github.com/SchisslerGroup/python-bigsimr.git@dev
bigsimr
relies on the Julia language to execute code through the python julia
package. Julia can be obtained from julialang.org, or it can be detected/installed automatically using the setup function provided by bigsimr
. The setup()
function will also install the required Julia packages for bigsimr.
from bigsimr import setup
setup(compiled_modules=False)
Note. The compiled_modules=False
argument is necessary for those using Python from a conda environment. There is a known bug where setup fails if compiled_modules
is set to True
(the default for the julia
package).
Using
from julia.api import Julia
jl = Julia(compiled_modules=False) # conda users -> set to False
from julia import Bigsimr as bs
from julia import Distributions as dist
import numpy as np
Examples
Pearson mathcing
target_corr = bs.cor_randPD(3)
margins = [dist.Binomial(20, 0.2), dist.Beta(2, 3), dist.LogNormal(3, 1)]
adjusted_corr = bs.pearson_match(target_corr, margins)
x = bs.rvec(100_000, adjusted_corr, margins)
bs.cor(x, bs.Pearson)
Spearman/Kendall matching
spearman_corr = bs.cor_randPD(3)
adjusted_corr = bs.cor_convert(spearman_corr, bs.Spearman, bs.Pearson)
x = bs.rvec(100_000, adjusted_corr, margins)
bs.cor(x, bs.Spearman)
Nearest correlation matrix
from julia.LinearAlgebra import isposdef
s = bs.cor_randPSD(200)
r = bs.cor_convert(s, bs.Spearman, bs.Pearson)
isposdef(r)
p = bs.cor_nearPD(r)
isposdef(p)
Fast approximate nearest correlation matrix
s = bs.cor_randPSD(2000)
r = bs.cor_convert(s, bs.Spearman, bs.Pearson)
isposdef(r)
p = bs.cor_fastPD(r)
isposdef(p)
References
- Xiao, Q., & Zhou, S. (2019). Matching a correlation coefficient by a Gaussian copula. Communications in Statistics-Theory and Methods, 48(7), 1728-1747.
- Lebrun, R., & Dutfoy, A. (2009). An innovating analysis of the Nataf transformation from the copula viewpoint. Probabilistic Engineering Mechanics, 24(3), 312-320.
- Qi, H., & Sun, D. (2006). A quadratically convergent Newton method for computing the nearest correlation matrix. SIAM journal on matrix analysis and applications, 28(2), 360-385.
- amoeba (https://stats.stackexchange.com/users/28666/amoeba), How to generate a large full-rank random correlation matrix with some strong correlations present?, URL (version: 2017-04-13): https://stats.stackexchange.com/q/125020
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file bigsimr-0.1.0.tar.gz
.
File metadata
- Download URL: bigsimr-0.1.0.tar.gz
- Upload date:
- Size: 3.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b3667b961912f9edcacf6ae951276839020a6d3cd3ac0cb44d500370746d2c6c |
|
MD5 | 919d867b5bc74280080d69ba241774ec |
|
BLAKE2b-256 | a9e8d48c9fe8643b874557109a70f89b60161de834003205ff90f5d5be434ed3 |
File details
Details for the file bigsimr-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: bigsimr-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b6be01d1447dcff9f84109dfd9b5aa5dbed5d0e97a0a9456a891bdc4e31570d9 |
|
MD5 | 8b3cd51c8a89c28f86c571f6f3042fff |
|
BLAKE2b-256 | e95ad0dd6057801584fee8bc7245848c1ebf98d824b73f6ec720feacaa9e576e |