Quality Data Extractor (QDE): CES & OES filtering for synthetic data
Project description
Quality Data Extractor (QDE)
QDE (Quality Data Extractor) is a Python framework for post-generation filtration of synthetic data.
It introduces two filtering strategies:
- CES (Comprehensive Extraction Strategy)
- OES (Optimal Extraction Strategy)
These strategies help researchers and practitioners filter synthetic datasets to retain samples that improve downstream model accuracy.
📄 Published in IEEE Access (2025):
Sachdeva, P., Malhotra, A., & Gupta, K. — Quality Data Extractor (QDE): Elevating Synthetic Data Augmentation through Post-Generation Filtration
🚀 Installation
From PyPI:
pip install qde
🔧 Quick Start
import qde
from qde import QDE
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_iris
import numpy as np
# Example: use Iris dataset
X, y = load_iris(return_X_y=True)
train_X, train_y = X[:80], y[:80]
synth_X, synth_y = X[80:110], y[80:110] # pretend this is synthetic
test_X, test_y = X[110:], y[110:]
# Initialize QDE with CES
q = QDE(default_strategy="ces")
q.fit(train_X, train_y, synth_X, synth_y, test_X, test_y, encode_labels=True)
# Extract filtered synthetic samples
result, X_sel, y_sel = q.extract(estimator=GaussianNB())
print("Selected indices:", result.indices)
print("Filtered accuracy:", result.meta["filtered-accuracy"])
🖥️ Command-Line Interface (CLI)
QDE also ships a CLI:
qde strategies
# -> ces
# -> oes
qde run --train train.csv --synth synth.csv --test test.csv --target target --strategy ces
📖 Documentation
-
CES
Adds synthetic samples one by one, retaining only those that do not reduce baseline accuracy. -
OES
Selects samples using distance-based neighborhood filtering (configurable with--k-neighborsand--distance-mode).
✅ Each run outputs
SelectionResult.indices→ indices of accepted synthetic samplesmeta→ metadata (strategy, accuracy metrics, etc.)
🛠️ Development
Clone the repo and install in editable mode:
git clone https://github.com/pragatischdv/quality-data-extractor
cd quality-data-extractor
pip install -e .
📄 Citation
If you use QDE in your research, please cite:
@ARTICLE{11142788,
author={Sachdeva, Pragati and Malhotra, Amarjit and Gupta, Karan},
journal={IEEE Access},
title={Quality Data Extractor (QDE): Elevating Synthetic Data Augmentation through Post-Generation Filtration},
year={2025},
doi={10.1109/ACCESS.2025.3603435}}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file qde-1.0.1.tar.gz.
File metadata
- Download URL: qde-1.0.1.tar.gz
- Upload date:
- Size: 11.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fcfd055e3347a37271a77e11459440969646704275fedb759e3b16f64fca3f37
|
|
| MD5 |
57e0024f4731c12b144dde756c33a8ac
|
|
| BLAKE2b-256 |
873e6120747a5ea550df4f20c31602f546439dba9c61e0a198ef3461114a526c
|
File details
Details for the file qde-1.0.1-py3-none-any.whl.
File metadata
- Download URL: qde-1.0.1-py3-none-any.whl
- Upload date:
- Size: 12.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2d81772925505bd114aaf4a6946845ae3feb46faecce3ad440a00f63dca5d87
|
|
| MD5 |
dda182fa4afbb6cf0f66b03fbb033bd2
|
|
| BLAKE2b-256 |
3fb58843f0353ed0fba42f356c6be13cfcd028e20711637678c4e7ed75f21521
|