Skip to main content

Synthetic Data Metrics is a Python library for evaluating synthetic data quality across a wide range of data types (image, tabular, time series, language) and approaches to evaluation.

Project description

PackageName

About

'Package name' is result of a project funded by the Alan Turing Institute. Packagename is a Python library for evaluating synthetic data quality across a wide range of data types (image, tabular, time series, language) and approaches to evaluation. Synthetic data is a crucial part of many machine learning, data science and other applications. The performance of these applications relies on the ‘quality’ and diversity of the synthetic data used. We understand that no clear definition of ‘quality’ (where quality is a measure of how indistinguishable the synthetic data is from the real data) exist in relation to synthetic data. In this package we provide an expanding list of metrics that open sourced and community driven.

The following approaches to evaluation are currently provided:

  • Inception Score
  • Frechet Inception Distance
  • Deep Discriminator
  • t-SNE
  • PCA followed by
    • IoU
    • DICE
graph LR
A[shallow] -->B[Dimensionality Reduction]
    B --> D[PCA]
    B --> E[t-SNE]
        D & E --> L(Similarity Score)
A --> C[TBC]
F[deep] --> G[Feature Extraction]
    G --> J[SNN]
    G --> K[VAE]
        J & K --> M(Distance)
F --> H[Discriminator]
F --> I[Model Inference]

Using Packagename

Getting Started

Prerequisites

Python 3.8 - 3.10

*We highly recommand that uses the version between 3.8 and 3.10. With these versions, dependencies can be installed properly.

Installation

After cloning the repo into a new directory, make a virtual environment called 'venv', activate it, and install the dependencies using pip tool, e.g.

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Note: the requirements file contains all the dependencies needed, and it works well when the version of python is in the mentioned range.

Once finished

deactivate

Extracting Metrics

Packagename can be utilised to analysing synthetic image, time-series, tabular and language data with 4 simple steps:

  1. Load the Packagename evaluator object
  2. Load in your data
  3. Create the corresponding datatype evaluator <datatype>_evaluator(data)
  4. Call the desired metric evaluator.<metric>()

We handle the rest.

Synthetic Image Evaluator Example

Load in the evaluator object

from synthetic_data_metrics import evaluator

Load in your data as one/two numpy arrays size = [n_samples,H,W,C]:

images, _, _, _ = load_cifar10()

Intialise the image evaluator and feed in your data:

img_evaluator = evaluator.Image_Evaluator(synth, real)

Call your desired metric(s):

img_evaluator.inception_score(n_splits=20)
img_evaluator.dim_reduced_iou_score()
img_evaluator.dim_reduced_dice_score()
img_evaluator.plot_2PC_compare()

Synthetic Time-series Evaluator Example

Load in the evaluator object:

from synthetic_data_metrics import evaluator

Load in your data as two a pandas dataframe with a target column:

real,synth = load_wisdm()

Intialise the image evaluator, feed in your data and supply the target column:

ts_evaluator = evaluator.TS_Evaluator(real,synth,'ACTIVITY')

Call your desired metric(s):

evaluator.discriminative_score()

Contributing:

Packagename is an open source codebase.

Linting

This library uses flake8 for linting. Please identify formatting errors in your code before pushing by running the following

flake8 path/to/file/to/test

Acknowledgements

This work was funded by The Turing Institute.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synthetic_data_metrics-0.0.3.tar.gz (13.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

synthetic_data_metrics-0.0.3-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file synthetic_data_metrics-0.0.3.tar.gz.

File metadata

  • Download URL: synthetic_data_metrics-0.0.3.tar.gz
  • Upload date:
  • Size: 13.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.10

File hashes

Hashes for synthetic_data_metrics-0.0.3.tar.gz
Algorithm Hash digest
SHA256 49756291010e2ba0d84ae3cc4d1be39cf6cbd1735139f2f3a7c88b8349495824
MD5 49cac326b05968beb21984d85aa68ee2
BLAKE2b-256 b183f682ee5ef4694d75adbb5a95b27c8328b82b9f04033a57203a137ae75f65

See more details on using hashes here.

File details

Details for the file synthetic_data_metrics-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for synthetic_data_metrics-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f5d4d8ac07c136cdec49547c23ce20c4515f38822967fb9dde023f6207ac5c99
MD5 3dbce6bf8becc84f4a4a5128438857d4
BLAKE2b-256 3f0d033c9281130c44d162a5cd6a87a3dfaa8c6bec6ce2a04b0d32ae3aae2a42

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page