Synthetic data generation pipeline leveraging a Differentially Private Variational Auto Encoder assessed using a variety of metrics
Project description
NHS Synth
About the Project
The project currently consists of a Python package alongside research and investigative materials covering the effectiveness of the package and synthetic data more generally when applied to NHS use cases.
Project Description - Synthetic Data Exploration: Variational Autoencoders
The codebase builds on previous NHSX Analytics Unit PhD internships contextualising and investigating the potential use of Variational Auto Encoders (VAEs) for synthetic data generation. These were undertaken by Dominic Danks and David Brind.
Note: No data, public or private are shared in this repository.
Getting Started
Project Structure
- The main package and codebase is found in
src/nhssynth
(see Usage below for more information) - Accompanying materials are available in the
docs
folder:- A report summarising the previous iteration of this project
- A model card providing more information about the VAE with Differential Privacy
- Numerous exemplar configurations are found in
config
- Empty
data
andexperiments
folders are provided; these are the default locations for inputs and outputs when running the project using the providedcli
module - Pre-processing notebooks for specific datasets used to assess the approach and other non-core code can be found in
auxiliary
Installation
As it stands, we recommend the following steps to reproduce our experiments and fully work with this project:
- Clone the repo
- Ensure one of the required versions of Python is installed
- Install
poetry
- Instantiate a virtual environment, e.g. via
python -m venv nhssynth
- Activate the virtual environment, e.g. via
source nhssynth/bin/activate
- Install project dependencies with
poetry install
(optionally installjupyter
andnotebook
to work with some of the preprocessing files inauxiliary
) - Interact with the package in one of two ways:
- Via the
cli
module usingpoetry run cli
- Through building the package with
poetry build
and using it in an existing project (import nhssynth
). However, if you intend on doing the latter it may be preferable to instead follow the second, simpler setup below.
- Via the
For more standard usage of the package:
- Run
pip install nhssynth
within a supported Python installation - Use the modules exported by the package as you would any other. Note that in this setup you will have to work more closely with the configuration and code to ensure you are handling inputs and outputs for each module appropriately. The cli handles a lot of this complexity, and interacting with the modules directly is considered advanced usage.
Usage
This package comprises a pipeline that is runnable via poetry run cli pipeline <args>
or poetry run cli config <config filepath>
. You can run the modules that make up this pipeline independently via poetry run cli <module name>
. To see the modules that are available and their corresponding arguments and function, run poetry run cli --help
/ poetry run cli <module name> --help
.
The figure below shows the structure and workflow of the package and its modules.
Roadmap
See the open issues for a list of proposed features (and known issues).
Contributing
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the project
- Create your branch (
git checkout -b <yourusername>/<featurename>
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin <yourusername>/<featurename>
) - Open a PR and we will try to get it merged!
See CONTRIBUTING.md for detailed guidance.
Thanks to everyone that has contributed so far!
License
Distributed under the MIT License. See LICENSE for more information.
Contact
This project is under active development by @HarrisonWilde, for any questions or security concerns contact him or raise an issue. Alternatively, contact NHS England TDAU.
To find out more about the Analytics Unit visit our project website or get in touch at england.tdau@nhs.net.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.