Skip to main content

Synthetic data generation pipeline leveraging a Differentially Private Variational Auto Encoder assessed using a variety of metrics

Project description

PyPI - Latest Release PyPI - Wheel PyPI - Package Status PyPI - Python Version PyPI - License Code style: black Imports: isort

NHS Synth

About the Project

The project currently consists of a Python package alongside research and investigative materials covering the effectiveness of the package and synthetic data more generally when applied to NHS use cases.

Project Description - Synthetic Data Exploration: Variational Autoencoders

The codebase builds on previous NHSX Analytics Unit PhD internships contextualising and investigating the potential use of Variational Auto Encoders (VAEs) for synthetic data generation. These were undertaken by Dominic Danks and David Brind.

Note: No data, public or private are shared in this repository.

Getting Started

Project Structure

  • The main package and codebase is found in src/nhssynth (see Usage below for more information)
  • Accompanying materials are available in the docs folder:
    • A report summarising the previous iteration of this project
    • A model card providing more information about the VAE with Differential Privacy
  • Numerous exemplar configurations are found in config
  • Empty data and experiments folders are provided; these are the default locations for inputs and outputs when running the project using the provided cli module
  • Pre-processing notebooks for specific datasets used to assess the approach and other non-core code can be found in auxiliary

Installation

As it stands, we recommend the following steps to reproduce our experiments and fully work with this project:

  1. Clone the repo
  2. Ensure one of the required versions of Python is installed
  3. Install poetry
  4. Instantiate a virtual environment, e.g. via python -m venv nhssynth
  5. Activate the virtual environment, e.g. via source nhssynth/bin/activate
  6. Install project dependencies with poetry install (optionally install jupyter and notebook to work with some of the preprocessing files in auxiliary)
  7. Interact with the package in one of two ways:
    • Via the cli module using poetry run cli
    • Through building the package with poetry build and using it in an existing project (import nhssynth). However, if you intend on doing the latter it may be preferable to instead follow the second, simpler setup below.

For more standard usage of the package:

  1. Run pip install nhssynth within a supported Python installation
  2. Use the modules exported by the package as you would any other. Note that in this setup you will have to work more closely with the configuration and code to ensure you are handling inputs and outputs for each module appropriately. The cli handles a lot of this complexity, and interacting with the modules directly is considered advanced usage.

Usage

This package comprises a pipeline that is runnable via poetry run cli pipeline <args> or poetry run cli config <config filepath>. You can run the modules that make up this pipeline independently via poetry run cli <module name>. To see the modules that are available and their corresponding arguments and function, run poetry run cli --help / poetry run cli <module name> --help.

The figure below shows the structure and workflow of the package and its modules.

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the project
  2. Create your branch (git checkout -b <yourusername>/<featurename>)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin <yourusername>/<featurename>)
  5. Open a PR and we will try to get it merged!

See CONTRIBUTING.md for detailed guidance.

Thanks to everyone that has contributed so far!

License

Distributed under the MIT License. See LICENSE for more information.

Contact

This project is under active development by @HarrisonWilde, for any questions or security concerns contact him or raise an issue. Alternatively, contact NHS England TDAU.

To find out more about the Analytics Unit visit our project website or get in touch at england.tdau@nhs.net.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nhssynth-0.2.0.tar.gz (35.3 kB view hashes)

Uploaded Source

Built Distribution

nhssynth-0.2.0-py3-none-any.whl (42.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page