Skip to main content

Synthetic Data Metrics is a Python library for evaluating synthetic data quality across a wide range of data types (image, tabular, time series, language) and approaches to evaluation.

Project description

PackageName

About

'Package name' is result of a project funded by the Alan Turing Institute. Packagename is a Python library for evaluating synthetic data quality across a wide range of data types (image, tabular, time series, language) and approaches to evaluation. Synthetic data is a crucial part of many machine learning, data science and other applications. The performance of these applications relies on the ‘quality’ and diversity of the synthetic data used. We understand that no clear definition of ‘quality’ (where quality is a measure of how indistinguishable the synthetic data is from the real data) exist in relation to synthetic data. In this package we provide an expanding list of metrics that open sourced and community driven.

The following approaches to evaluation are currently provided:

  • Inception Score
  • Frechet Inception Distance
  • Deep Discriminator
  • t-SNE
  • PCA followed by
    • IoU
    • DICE
graph LR
A[shallow] -->B[Dimensionality Reduction]
    B --> D[PCA]
    B --> E[t-SNE]
        D & E --> L(Similarity Score)
A --> C[TBC]
F[deep] --> G[Feature Extraction]
    G --> J[SNN]
    G --> K[VAE]
        J & K --> M(Distance)
F --> H[Discriminator]
F --> I[Model Inference]

Using Packagename

Getting Started

Prerequisites

Python 3.8 - 3.10

*We highly recommand that uses the version between 3.8 and 3.10. With these versions, dependencies can be installed properly.

Installation

After cloning the repo into a new directory, make a virtual environment called 'venv', activate it, and install the dependencies using pip tool, e.g.

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Note: the requirements file contains all the dependencies needed, and it works well when the version of python is in the mentioned range.

Once finished

deactivate

Extracting Metrics

Packagename can be utilised to analysing synthetic image, time-series, tabular and language data with 4 simple steps:

  1. Load the Packagename evaluator object
  2. Load in your data
  3. Create the corresponding datatype evaluator <datatype>_evaluator(data)
  4. Call the desired metric evaluator.<metric>()

We handle the rest.

Synthetic Image Evaluator Example

Load in the evaluator object

from synthetic_data_metrics import evaluator

Load in your data as one/two numpy arrays size = [n_samples,H,W,C]:

images, _, _, _ = load_cifar10()

Intialise the image evaluator and feed in your data:

img_evaluator = evaluator.Image_Evaluator(synth, real)

Call your desired metric(s):

img_evaluator.inception_score(n_splits=20)
img_evaluator.dim_reduced_iou_score()
img_evaluator.dim_reduced_dice_score()
img_evaluator.plot_2PC_compare()

Synthetic Time-series Evaluator Example

Load in the evaluator object:

from synthetic_data_metrics import evaluator

Load in your data as two a pandas dataframe with a target column:

real,synth = load_wisdm()

Intialise the image evaluator, feed in your data and supply the target column:

ts_evaluator = evaluator.TS_Evaluator(real,synth,'ACTIVITY')

Call your desired metric(s):

evaluator.discriminative_score()

Contributing:

Packagename is an open source codebase.

Linting

This library uses flake8 for linting. Please identify formatting errors in your code before pushing by running the following

flake8 path/to/file/to/test

Acknowledgements

This work was funded by The Turing Institute.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

synth_data_metrics-0.0.1-py3-none-any.whl (24.8 kB view details)

Uploaded Python 3

File details

Details for the file synth_data_metrics-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for synth_data_metrics-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a1adaa701cf3c90662b70935c97a61d136fd5ef2288bf0570cc24f5c6f703a27
MD5 b32f392ee123fd1f5b967db39b46ddd1
BLAKE2b-256 30cf827cf82e3f827d5bd0f8f07e94a803ac24be23e0e5ea176bd8e6452f36b1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page