Skip to main content

Create tabular synthetic data using copulas-based modeling.

Project description

This repository is part of The Synthetic Data Vault Project, a project from DataCebo.

Development Status PyPi Shield Downloads Unit Tests Coverage Status Slack


Overview

Copulas is a Python library for modeling multivariate distributions and sampling from them using copula functions. Given a table of numerical data, use Copulas to learn the distribution and generate new synthetic data following the same statistical properties.

Key Features:

  • Model multivariate data. Choose from a variety of univariate distributions and copulas – including Archimedian Copulas, Gaussian Copulas and Vine Copulas.

  • Compare real and synthetic data visually after building your model. Visualizations are available as 1D histograms, 2D scatterplots and 3D scatterplots.

  • Access & manipulate learned parameters. With complete access to the internals of the model, set or tune parameters to your choosing.

Install

Install the Copulas library using pip or conda.

pip install copulas
conda install -c conda-forge copulas

Usage

Get started using a demo dataset. This dataset contains 3 numerical columns.

from copulas.datasets import sample_trivariate_xyz

real_data = sample_trivariate_xyz()
real_data.head()

Model the data using a copula and use it to create synthetic data. The Copulas library offers many options including Gaussian Copula, Vine Copulas and Archimedian Copulas.

from copulas.multivariate import GaussianMultivariate

copula = GaussianMultivariate()
copula.fit(real_data)

synthetic_data = copula.sample(len(real_data))

Visualize the real and synthetic data side-by-side. Let's do this in 3D so see our full dataset.

from copulas.visualization import compare_3d

compare_3d(real_data, synthetic_data)

Quickstart

Tutorials

Click below to run the code yourself on a Colab Notebook and discover new features.

Tutorial Notebook

Community & Support

Learn more about Copulas library from our documentation site.

Questions or issues? Join our Slack channel to discuss more about Copulas and synthetic data. If you find a bug or have a feature request, you can also open an issue on our GitHub.

Interested in contributing to Copulas? Read our Contribution Guide to get started.

Credits

The Copulas open source project first started at the Data to AI Lab at MIT in 2018. Thank you to our team of contributors who have built and maintained the library over the years!

View Contributors




The Synthetic Data Vault Project was first created at MIT's Data to AI Lab in 2016. After 4 years of research and traction with enterprise, we created DataCebo in 2020 with the goal of growing the project. Today, DataCebo is the proud developer of SDV, the largest ecosystem for synthetic data generation & evaluation. It is home to multiple libraries that support synthetic data, including:

  • 🔄 Data discovery & transformation. Reverse the transforms to reproduce realistic data.
  • 🧠 Multiple machine learning models -- ranging from Copulas to Deep Learning -- to create tabular, multi table and time series data.
  • 📊 Measuring quality and privacy of synthetic data, and comparing different synthetic data generation models.

Get started using the SDV package -- a fully integrated solution and your one-stop shop for synthetic data. Or, use the standalone libraries for specific needs.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

copulas-0.12.2.tar.gz (45.0 kB view details)

Uploaded Source

Built Distribution

copulas-0.12.2-py3-none-any.whl (52.5 kB view details)

Uploaded Python 3

File details

Details for the file copulas-0.12.2.tar.gz.

File metadata

  • Download URL: copulas-0.12.2.tar.gz
  • Upload date:
  • Size: 45.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.32.3 requests-toolbelt/1.0.0 urllib3/1.26.20 tqdm/4.67.1 importlib-metadata/8.6.1 keyring/25.6.0 rfc3986/2.0.0 colorama/0.4.6 CPython/3.9.20

File hashes

Hashes for copulas-0.12.2.tar.gz
Algorithm Hash digest
SHA256 c13bcb6343bca4f17e68ec3be3ab3b497734e18821b60bf8af7f68ba24784411
MD5 823453b66f942f818800381035ab3e33
BLAKE2b-256 911b6e5911c06242b1ca89b520db99d4b939acf41184b1251ba6b4e52c7416af

See more details on using hashes here.

File details

Details for the file copulas-0.12.2-py3-none-any.whl.

File metadata

  • Download URL: copulas-0.12.2-py3-none-any.whl
  • Upload date:
  • Size: 52.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.32.3 requests-toolbelt/1.0.0 urllib3/1.26.20 tqdm/4.67.1 importlib-metadata/8.6.1 keyring/25.6.0 rfc3986/2.0.0 colorama/0.4.6 CPython/3.9.20

File hashes

Hashes for copulas-0.12.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3ac207c778fff34f6e38314518d4f1672a3a16ddb3b3e6a64b8d9ade629a89e7
MD5 2dd454588513b3505f2523621c977ecb
BLAKE2b-256 4c313ddc47aa7d4c6c4f0adc6fe5a393a408a729f3053f1c6daf8367ad858a2b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page