Unified Transformer for Multi-Task Data Quality
Project description
UNIDQ: Unified Data Quality
A unified transformer architecture for multi-task data quality assessment.
UNIDQ addresses 6 data quality tasks with a single model:
- ✅ Error Detection (F1=0.894, +42% vs Raha)
- ✅ Data Repair
- ✅ Missing Value Imputation (R²=0.941, +295% vs MICE)
- ✅ Label Noise Detection (F1=0.856, +28% vs Cleanlab)
- ✅ Label Classification
- ✅ Data Valuation
Installation
pip install unidq
Quick Start
from unidq import UNIDQ, MultiTaskDataset, UNIDQTrainer
# Load your data
dataset = MultiTaskDataset(
dirty_features=X_dirty,
clean_features=X_clean,
error_mask=errors,
labels=y
)
# Initialize model
model = UNIDQ(n_features=X_dirty.shape[1])
# Train
trainer = UNIDQTrainer(model)
trainer.fit(dataset)
# Predict
results = model.predict(X_new)
print(f"Detected errors: {results['errors']}")
print(f"Imputed values: {results['imputed']}")
Citation
If you use UNIDQ in your research, please cite:
@inproceedings{unidq2026,
title={UNIDQ: A Unified Transformer Architecture for Multi-Task Data Quality},
author={Your Name},
booktitle={Proceedings of the VLDB Endowment},
year={2026}
}
License
MIT License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
unidq-0.1.0.tar.gz
(15.0 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
unidq-0.1.0-py3-none-any.whl
(13.1 kB
view details)
File details
Details for the file unidq-0.1.0.tar.gz.
File metadata
- Download URL: unidq-0.1.0.tar.gz
- Upload date:
- Size: 15.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a4e1723169a3f21e57b898993a2f5e5b0897efc4f379b7abea9958a983d6c3c1
|
|
| MD5 |
217aa705bd62adb4f30a242ba4663687
|
|
| BLAKE2b-256 |
2dccc921a7ea768688fa6f31d43bc02c6b13d53d897f8932e345f646fab974b2
|
File details
Details for the file unidq-0.1.0-py3-none-any.whl.
File metadata
- Download URL: unidq-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
07e82d020b2c54bb20ba38c90dffc01a368ba58a410007fac02250f0bb428706
|
|
| MD5 |
1cd48056d6852db5c7edd3503c726310
|
|
| BLAKE2b-256 |
8e79c3293f6d6025532e791f85a92f8e70daff5a72baf655d5b633be4764f8fd
|