Synthetic Data Metrics is a Python library for evaluating synthetic data quality across a wide range of data types (image, tabular, time series, language) and approaches to evaluation.
Project description
PackageName
About
'Package name' is result of a project funded by the Alan Turing Institute. Packagename is a Python library for evaluating synthetic data quality across a wide range of data types (image, tabular, time series, language) and approaches to evaluation. Synthetic data is a crucial part of many machine learning, data science and other applications. The performance of these applications relies on the ‘quality’ and diversity of the synthetic data used. We understand that no clear definition of ‘quality’ (where quality is a measure of how indistinguishable the synthetic data is from the real data) exist in relation to synthetic data. In this package we provide an expanding list of metrics that open sourced and community driven.
The following approaches to evaluation are currently provided:
- Inception Score
- Frechet Inception Distance
- Deep Discriminator
- t-SNE
- PCA followed by
- IoU
- DICE
graph LR
A[shallow] -->B[Dimensionality Reduction]
B --> D[PCA]
B --> E[t-SNE]
D & E --> L(Similarity Score)
A --> C[TBC]
F[deep] --> G[Feature Extraction]
G --> J[SNN]
G --> K[VAE]
J & K --> M(Distance)
F --> H[Discriminator]
F --> I[Model Inference]
Using Packagename
Getting Started
Prerequisites
Python 3.8 - 3.10
*We highly recommand that uses the version between 3.8 and 3.10. With these versions, dependencies can be installed properly.
Installation
After cloning the repo into a new directory, make a virtual environment called 'venv', activate it, and install the dependencies using pip tool, e.g.
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Note: the requirements file contains all the dependencies needed, and it works well when the version of python is in the mentioned range.
Once finished
deactivate
Extracting Metrics
Packagename can be utilised to analysing synthetic image, time-series, tabular and language data with 4 simple steps:
- Load the Packagename
evaluator
object - Load in your
data
- Create the corresponding datatype evaluator
<datatype>_evaluator(data)
- Call the desired metric
evaluator.<metric>()
We handle the rest.
Synthetic Image Evaluator Example
Load in the evaluator object
from synthetic_data_metrics import evaluator
Load in your data as one/two numpy arrays size = [n_samples,H,W,C]
:
images, _, _, _ = load_cifar10()
Intialise the image evaluator and feed in your data:
img_evaluator = evaluator.Image_Evaluator(synth, real)
Call your desired metric(s):
img_evaluator.inception_score(n_splits=20)
img_evaluator.dim_reduced_iou_score()
img_evaluator.dim_reduced_dice_score()
img_evaluator.plot_2PC_compare()
Synthetic Time-series Evaluator Example
Load in the evaluator object:
from synthetic_data_metrics import evaluator
Load in your data as two a pandas dataframe with a target column:
real,synth = load_wisdm()
Intialise the image evaluator, feed in your data and supply the target column:
ts_evaluator = evaluator.TS_Evaluator(real,synth,'ACTIVITY')
Call your desired metric(s):
evaluator.discriminative_score()
Contributing:
Packagename is an open source codebase.
Linting
This library uses flake8
for linting. Please identify formatting errors in your code before pushing by running the following
flake8 path/to/file/to/test
Acknowledgements
This work was funded by The Turing Institute.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for synthetic_data_metrics-0.0.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | bbd7e75e9c6e267c702344f43ce0324e399c2a0d70ddbe2fb350e1757070848a |
|
MD5 | d7082af66ba93d6359a3bb64c6bd3a7b |
|
BLAKE2b-256 | ae7210f71561a5359193e17e172a906e8093c13f18d69063b8760a87ffce8644 |
Hashes for synthetic_data_metrics-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 288be241cd8487430e8587cb068659e060eef572e117ba34f587d21a04e3f823 |
|
MD5 | 0cc396f9bd572d7acce92438243df591 |
|
BLAKE2b-256 | 4a9b9c615175c5814f57f0cdb0529dd45d03f7b8dd3f79334c04ba172762bd99 |