Skip to main content

Mixed-type multivariate time series modeling with generative adversarial networks.

Project description

DAI-Lab An Open Source Project from the Data to AI Lab, at MIT

Development Status PyPi Shield Tests Downloads Coverage Status Binder Slack

Overview

DeepEcho is a Synthetic Data Generation Python library for mixed-type, multivariate time series. It provides:

  1. Multiple models based both on classical statistical modeling of time series and the latest in Deep Learning techniques.
  2. A robust benchmarking framework for evaluating these methods on multiple datasets and with multiple metrics.
  3. Ability for Machine Learning researchers to submit new methods following our model and sample API and get evaluated.

Try it out now!

If you want to quickly discover DeepEcho, simply click the button below and follow the tutorials!

Binder

Join our Slack Workspace

If you want to be part of the SDV community to receive announcements of the latest releases, ask questions, suggest new features or participate in the development meetings, please join our Slack Workspace!

Slack

Install

DeepEcho is part of the SDV project and is automatically installed alongside it. For details about this process please visit the SDV Installation Guide

Optionally, DeepEcho can also be installed as a standalone library using the following commands:

Using pip:

pip install deepecho

Using conda:

conda install -c sdv-dev -c pytorch -c conda-forge deepecho

For more installation options please visit the DeepEcho installation Guide

Quickstart

DeepEcho is included as part of SDV to model and sample synthetic time series. In most cases, usage through SDV is recommeded, since it provides additional functionalities which are not available here. For more details about how to use DeepEcho whithin SDV, please visit the corresponding User Guide:

Standalone usage

DeepEcho can also be used as a standalone library.

In this short quickstart, we show how to learn a mixed-type multivariate time series dataset and then generate synthetic data that resembles it.

We will start by loading the data and preparing the instance of our model.

from deepecho import PARModel
from deepecho.demo import load_demo

# Load demo data
data = load_demo()

# Define data types for all the columns
data_types = {
    'region': 'categorical',
    'day_of_week': 'categorical',
    'total_sales': 'continuous',
    'nb_customers': 'count',
}

model = PARModel(cuda=False)

If we want to use different settings for our model, like increasing the number of epochs or enabling CUDA, we can pass the arguments when creating the model:

model = PARModel(epochs=1024, cuda=True)

Notice that for smaller datasets like the one used on this demo, CUDA usage introduces more overhead than the gains it obtains from parallelization, so the process in this case is more efficient without CUDA, even if it is available.

Once we have created our instance, we are ready to learn the data and generate new synthetic data that resembles it:

# Learn a model from the data
model.fit(
    data=data,
    entity_columns=['store_id'],
    context_columns=['region'],
    data_types=data_types,
    sequence_index='date'
)

# Sample new data
model.sample(num_entities=5)

The output will be a table with synthetic time series data with the same properties to the demo data that we used as input.

What's next?

For more details about DeepEcho and all its possibilities and features, please check and run the tutorials.

If you want to see how we evaluate the performance and quality of our models, please have a look at the SDGym Benchmarking framework.

Also, please feel welcome to visit our contributing guide in order to help us developing new features or cool ideas!

The Synthetic Data Vault

This repository is part of The Synthetic Data Vault Project

History

0.3.0 - 2021-11-15

This release adds support for Python 3.9 and updates dependencies to ensure compatibility with the rest of the SDV ecosystem.

  • Add support for Python 3.9 - Issue41 by @fealho
  • Add pip check to CI workflows internal improvements - Issue39 by @pvk-developer
  • Add support for pylint>2.7.2 housekeeping - Issue33 by @fealho
  • Add support for torch>=1.8 housekeeping - Issue32 by @fealho

0.2.1 - 2021-10-12

This release fixes a bug with how DeepEcho handles NaN values.

  • Handling NaN's bug - Issue #35 by @fealho

0.2.0 - 2021-02-24

Maintenance release to update dependencies and ensure compatibility with the rest of the SDV ecosystem libraries.

0.1.4 - 2020-10-16

Minor maintenance version to update dependencies and documentation, and also make the demo data loading function parse dates properly.

0.1.3 - 2020-10-16

This version includes several minor improvements to the PAR model and the way the sequences are generated:

  • Sequences can now be generated without dropping the sequence index.
  • The PAR model learns the min and max length of the sequence from the input data.
  • NaN values are properly supported for both categorical and numerical columns.
  • NaN values are generated for numerical columns only if there were NaNs in the input data.
  • Constant columns can now be modeled.

0.1.2 - 2020-09-15

Add BasicGAN Model and additional benchmarking results.

0.1.1 - 2020-08-15

This release includes a few new features to make DeepEcho work on more types of datasets as well as to making it easier to add new datasets to the benchmarking framework.

  • Add segment_size and sequence_index arguments to fit method.
  • Add sequence_length as an optional argument to sample and sample_sequence methods.
  • Update the Dataset storage format to add sequence_index and versioning.
  • Separate the sequence assembling process in its own deepecho.sequences module.
  • Add function make_dataset to create a dataset from a dataframe and just a few column names.
  • Add notebook tutorial to show how to create a datasets and use them.

0.1.0 - 2020-08-11

First release.

Included Features:

  • PARModel
  • Demo dataset and tutorials
  • Benchmarking Framework
  • Support and instructions for benchmarking on a Kubernetes cluster.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for deepecho, version 0.3.0.post1
Filename, size File type Python version Upload date Hashes
Filename, size deepecho-0.3.0.post1.tar.gz (34.1 kB) File type Source Python version None Upload date Hashes View
Filename, size deepecho-0.3.0.post1-py2.py3-none-any.whl (26.9 kB) File type Wheel Python version py2.py3 Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page