A python library for building different types of copulas and using them for sampling.
An Open Source Project from the Data to AI Lab, at MIT
- Website: https://sdv.dev
- Documentation: https://sdv.dev/Copulas
- Repository: https://github.com/sdv-dev/Copulas
- License: MIT
- Development Status: Pre-Alpha
Copulas is a Python library for modeling multivariate distributions and sampling from them using copula functions. Given a table containing numerical data, we can use Copulas to learn the distribution and later on generate new synthetic rows following the same statistical properties.
Some of the features provided by this library include:
- A variety of distributions for modeling univariate data.
- Multiple Archimedean copulas for modeling bivariate data.
- Gaussian and Vine copulas for modeling multivariate data.
- Automatic selection of univariate distributions and bivariate copulas.
- Gaussian KDE
- Student T
- Truncated Gaussian
Archimedean Copulas (Bivariate)
- Gaussian Copula
Copulas is part of the SDV project and is automatically installed alongside it. For details about this process please visit the SDV Installation Guide
Optionally, Copulas can also be installed as a standalone library using the following commands:
pip install copulas
conda install -c sdv-dev -c conda-forge copulas
For more installation options please visit the Copulas installation Guide
In this short quickstart, we show how to model a multivariate dataset and then generate synthetic data that resembles it.
import warnings warnings.filterwarnings('ignore') from copulas.datasets import sample_trivariate_xyz from copulas.multivariate import GaussianMultivariate from copulas.visualization import compare_3d # Load a dataset with 3 columns that are not independent real_data = sample_trivariate_xyz() # Fit a gaussian copula to the data copula = GaussianMultivariate() copula.fit(real_data) # Sample synthetic data synthetic_data = copula.sample(len(real_data)) # Plot the real and the synthetic data to compare compare_3d(real_data, synthetic_data)
The output will be a figure with two plots, showing what both the real and the synthetic data that you just generated look like:
For more details about Copulas and all its possibilities and features, please check the documentation site.
There you can learn more about how to contribute to Copulas in order to help us developing new features or cool ideas.
Copulas is an open source project from the Data to AI Lab at MIT which has been built and maintained over the years by the following team:
- Manuel Alvarez email@example.com
- Carles Sala firstname.lastname@example.org
- (Alicia) Yi Sun email@example.com
- José David Pérez firstname.lastname@example.org
- Kevin Alex Zhang email@example.com
- Andrew Montanez firstname.lastname@example.org
- Gabriele Bonomi email@example.com
- Kalyan Veeramachaneni firstname.lastname@example.org
- Iván Ramírez email@example.com
- Felipe Alex Hofmann firstname.lastname@example.org
- paulolimac email@example.com
- nazar-ivantsiv firstname.lastname@example.org
The Synthetic Data Vault
This repository is part of The Synthetic Data Vault Project
v0.5.1 - 2021-08-13
This release improves performance by changing the way scipy stats is used, calling their methods directly without creating intermediate instances.
It also fixes a bug introduced by the scipy 1.7.0 release where some distributions fail to fit because scipy validates the learned parameters.
- Exception: Optimization converged to parameters that are outside the range allowed by the distribution. - Issue #264 by @csala
- Use scipy stats models directly without creating instances - Issue #261 by @csala
v0.5.0 - 2021-01-24
This release introduces conditional sampling for the GaussianMultivariate modeling. The new conditioning feature allows passing a dictionary with the values to use to condition the rest of the columns.
It also fixes a bug that prevented constant distributions to be restored from a dictionary and updates some dependencies.
- Conditional sampling from Gaussian copula - Issue #154 by @csala
- ScipyModel subclasses fail to restore constant values when using
from_dict- Issue #212 by @csala
v0.4.0 - 2021-01-27
This release introduces a few changes to optimize processing speed by re-implementing the Gaussian KDE pdf to use vectorized root finding methods and also adding the option to subsample the data during univariate selection.
gaussian_kdefaster - Issue #200 by @k15z and @fealho
- Use sub-sampling in
select_univariate- Issue #183 by @csala
v0.3.3 - 2020-09-18
covin the GaussianMultivariate - Issue #195 by @rollervan
- Add arguments to GaussianKDE - Issue #181 by @rollervan
- Log Laplace Distribution - Issue #188 by @rollervan
v0.3.2 - 2020-08-08
- Add Uniform Univariate - Issue #179 by @rollervan
v0.3.1 - 2020-07-09
- Raise numpy version upper bound to 2 - Issue #178 by @csala
- Add Student T Univariate - Issue #172 by @gbonomib
- Error in Quickstarts : Unknown projection '3d' - Issue #174 by @csala
v0.3.0 - 2020-03-27
Important revamp of the internal implementation of the project, the testing infrastructure and the documentation by Kevin Alex Zhang @k15z, Carles Sala @csala and Kalyan Veeramachaneni @kveerama
- Reimplementation of the existing Univariate distributions.
- Addition of new Beta and Gamma Univariates.
- New Univariate API with automatic selection of the optimal distribution.
- Several improvements and fixes on the Bivariate and Multivariate Copulas implementation.
- New visualization module with simple plotting patterns to visualize probability distributions.
- New datasets module with toy datasets sampling functions.
- New testing infrastructure with end-to-end, numerical and large scale testing.
- Improved tutorials and documentation.
v0.2.5 - 2020-01-17
- Convert import_object to get_instance - Issue #114 by @JDTheRipperPC
v0.2.4 - 2019-12-23
- Allow creating copula classes directly - Issue #117 by @csala
Bivariate- Issue #118 by @csala
- Rename TruncNorm to TruncGaussian and make it non standard - Issue #102 by @csala @JDTheRipperPC
- Error on Frank and Gumble sampling - Issue #112 by @csala
v0.2.3 - 2019-09-17
- Add support to Python 3.7 - Issue #53 by @JDTheRipperPC
- Document RELEASE workflow - Issue #105 by @JDTheRipperPC
- Improve serialization of univariate distributions - Issue #99 by @ManuelAlvarezC and @JDTheRipperPC
- The method 'select_copula' of Bivariate return wrong CopulaType - Issue #101 by @JDTheRipperPC
v0.2.2 - 2019-07-31
truncnormdistribution and a generic wrapper for
scipy.rv_continousdistributions - Issue #27 by @amontanez, @csala and @ManuelAlvarezC
Independencebivariate copulas - Issue #46 by @aliciasun, @csala and @ManuelAlvarezC
- Option to select seed on random number generator - Issue #63 by @echo66 and @ManuelAlvarezC
- Option on Vine copulas to select number of rows to sample - Issue #77 by @ManuelAlvarezC
- Make copulas accept both scalars and arrays as arguments - Issues #85 and #90 by @ManuelAlvarezC
- Ability to properly handle constant data - Issues #57 and #82 by @csala and @ManuelAlvarezC
- Tests for analytics properties of copulas - Issue #61 by @ManuelAlvarezC
- Improved documentation - Issue #96 by @ManuelAlvarezC
- Fix bug on Vine copulas, that made it crash during the bivariate copula selection - Issue #64 by @echo66 and @ManuelAlvarezC
v0.2.1 - Vine serialization
- Add serialization to Vine copulas.
distributionas argument for the Gaussian Copula.
- Improve Bivariate Copulas code structure to remove code duplication.
- Fix bug in Vine Copulas sampling: 'Edge' object has no attribute 'index'
- Improve code documentation.
- Improve code style and linting tools configuration.
v0.2.0 - Unified API
- New API for stats methods.
- Standarize input and output to
- Increase unittest coverage to 90%.
- Add methods to load/save copulas.
- Improve Gaussian copula sampling accuracy.
v0.1.1 - Minor Improvements
- Different Copula types separated in subclasses
- Extensive Unit Testing
- More pythonic names in the public API.
- Stop using third party elements that will be deprected soon.
- Add methods to sample new data on bivariate copulas.
- New KDE Univariate copula
- Improved examples with additional demo data.
v0.1.0 - First Release
- First release on PyPI.
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size copulas-0.5.1-py2.py3-none-any.whl (50.2 kB)||File type Wheel||Python version py2.py3||Upload date||Hashes View|
|Filename, size copulas-0.5.1.tar.gz (1.1 MB)||File type Source||Python version None||Upload date||Hashes View|
Hashes for copulas-0.5.1-py2.py3-none-any.whl