Skip to main content

Synthetic Data Metrics is a Python library for evaluating synthetic data quality across a wide range of data types (image, tabular, time series, language) and approaches to evaluation.

Project description

PackageName

About

'Package name' is result of a project funded by the Alan Turing Institute. Packagename is a Python library for evaluating synthetic data quality across a wide range of data types (image, tabular, time series, language) and approaches to evaluation. Synthetic data is a crucial part of many machine learning, data science and other applications. The performance of these applications relies on the ‘quality’ and diversity of the synthetic data used. We understand that no clear definition of ‘quality’ (where quality is a measure of how indistinguishable the synthetic data is from the real data) exist in relation to synthetic data. In this package we provide an expanding list of metrics that open sourced and community driven.

The following approaches to evaluation are currently provided:

  • Inception Score
  • Frechet Inception Distance
  • Deep Discriminator
  • t-SNE
  • PCA followed by
    • IoU
    • DICE
graph LR
A[shallow] -->B[Dimensionality Reduction]
    B --> D[PCA]
    B --> E[t-SNE]
        D & E --> L(Similarity Score)
A --> C[TBC]
F[deep] --> G[Feature Extraction]
    G --> J[SNN]
    G --> K[VAE]
        J & K --> M(Distance)
F --> H[Discriminator]
F --> I[Model Inference]

Using Packagename

Getting Started

Prerequisites

Python 3.8 - 3.10

*We highly recommand that uses the version between 3.8 and 3.10. With these versions, dependencies can be installed properly.

Installation

After cloning the repo into a new directory, make a virtual environment called 'venv', activate it, and install the dependencies using pip tool, e.g.

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Note: the requirements file contains all the dependencies needed, and it works well when the version of python is in the mentioned range.

Once finished

deactivate

Extracting Metrics

Packagename can be utilised to analysing synthetic image, time-series, tabular and language data with 4 simple steps:

  1. Load the Packagename evaluator object
  2. Load in your data
  3. Create the corresponding datatype evaluator <datatype>_evaluator(data)
  4. Call the desired metric evaluator.<metric>()

We handle the rest.

Synthetic Image Evaluator Example

Load in the evaluator object

from synthetic_data_metrics import evaluator

Load in your data as one/two numpy arrays size = [n_samples,H,W,C]:

images, _, _, _ = load_cifar10()

Intialise the image evaluator and feed in your data:

img_evaluator = evaluator.Image_Evaluator(synth, real)

Call your desired metric(s):

img_evaluator.inception_score(n_splits=20)
img_evaluator.dim_reduced_iou_score()
img_evaluator.dim_reduced_dice_score()
img_evaluator.plot_2PC_compare()

Synthetic Time-series Evaluator Example

Load in the evaluator object:

from synthetic_data_metrics import evaluator

Load in your data as two a pandas dataframe with a target column:

real,synth = load_wisdm()

Intialise the image evaluator, feed in your data and supply the target column:

ts_evaluator = evaluator.TS_Evaluator(real,synth,'ACTIVITY')

Call your desired metric(s):

evaluator.discriminative_score()

Contributing:

Packagename is an open source codebase.

Linting

This library uses flake8 for linting. Please identify formatting errors in your code before pushing by running the following

flake8 path/to/file/to/test

Acknowledgements

This work was funded by The Turing Institute.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synthetic_data_metrics-0.0.2.tar.gz (13.9 kB view details)

Uploaded Source

Built Distribution

synthetic_data_metrics-0.0.2-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file synthetic_data_metrics-0.0.2.tar.gz.

File metadata

  • Download URL: synthetic_data_metrics-0.0.2.tar.gz
  • Upload date:
  • Size: 13.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.10

File hashes

Hashes for synthetic_data_metrics-0.0.2.tar.gz
Algorithm Hash digest
SHA256 bbd7e75e9c6e267c702344f43ce0324e399c2a0d70ddbe2fb350e1757070848a
MD5 d7082af66ba93d6359a3bb64c6bd3a7b
BLAKE2b-256 ae7210f71561a5359193e17e172a906e8093c13f18d69063b8760a87ffce8644

See more details on using hashes here.

File details

Details for the file synthetic_data_metrics-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for synthetic_data_metrics-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 288be241cd8487430e8587cb068659e060eef572e117ba34f587d21a04e3f823
MD5 0cc396f9bd572d7acce92438243df591
BLAKE2b-256 4a9b9c615175c5814f57f0cdb0529dd45d03f7b8dd3f79334c04ba172762bd99

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page