Time series diffusion in the frequency domain.
Project description
Time Series Diffusion in the Frequency Domain
This repository implements time series diffusion in the frequency domain. For more details, please read our paper: Time Series Diffusion in the Frequency Domain.
1. Install
From repository:
- Clone the repository.
- Create and activate a new environment with conda (with
Python 3.10
or newer).
conda env create -n fdiff python=3.10
conda activate fdiff
- Install the requirement.
pip install -e .
- If you intend to train models, make sure that wandb is correctly configured on your machine by following this guide.
- Some of the datasets are automatically downloaded by our scripts via kaggle API. Make sure to create a kaggle token as explained here.
When the packages are installed, you are ready to train diffusion models!
2. Use
2.1 Train
In order to train models, you can simply run the following command:
python cmd/train.py
By default, this command will train a score model in the time domain with the ecg
dataset. In order to modify this behaviour, you can use hydra override syntax. The following hyperparameters can be modified to retrain all the models appearing in the paper:
Hyperparameter | Description | Values |
---|---|---|
fourier_transform | Whether or not to train a diffusion model in the frequency domain. | true, false |
datamodule | Name of the dataset to use. | ecg, mimiciii, nasa, nasdaq, usdroughts |
datamodule.subdataset | For the NASA dataset only. Selects between the charge and discharge subsets. | charge, discharge |
datamodule.smoother_width | For the ECG dataset only. Width of the Gaussian kernel smoother applied in the frequency domain. | $\mathbb{R}^+$ |
score_model | The backbone to use for the score model. | default, lstm |
At the end of training, your model is stored in the lightning_logs
directory, in a folder named after the current run_id
. You can find the run_id
in the logs of the training and in the wandb dashboard if you have correctly configured wandb.
2.2 Sample
In order to sample from a trained model, you can simply run the following command:
python cmd/sample.py model_id=XYZ
where XYZ
is the run_id
of the model you want to sample from. At the end of sampling, the samples are stored in the lightning_logs
directory, in a folder named after the current run_id
.
One can then reproduce the plots in the paper by including the run_id
to the run_list
list appearing in this notebook and running all cells.
3. Contribute
If you wish to contribute, please make sure that your code is compliant with our tests and coding conventions. To do so, you should install the required testing packages with:
pip install -e .[test]
Then, you can run the tests with:
pytest
Before any commit, please make sure that your staged code is compliant with our coding conventions by running:
pre-commit
4. Cite us
If you use this code, please acknowledge our work by citing
@misc{crabbé2024time,
title={Time Series Diffusion in the Frequency Domain},
author={Jonathan Crabbé and Nicolas Huynh and Jan Stanczuk and Mihaela van der Schaar},
year={2024},
eprint={2402.05933},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.