Skip to main content

Unified Transformer for Multi-Task Data Quality

Project description

UNIDQ: Unified Data Quality

PyPI version License: MIT Python 3.8+

A unified transformer architecture for multi-task data quality assessment.

UNIDQ addresses 6 data quality tasks with a single model:

  • ✅ Error Detection (F1=0.894, +42% vs Raha)
  • ✅ Data Repair
  • ✅ Missing Value Imputation (R²=0.941, +295% vs MICE)
  • ✅ Label Noise Detection (F1=0.856, +28% vs Cleanlab)
  • ✅ Label Classification
  • ✅ Data Valuation

Installation

pip install unidq

Quick Start

from unidq import UNIDQ, MultiTaskDataset, UNIDQTrainer

# Load your data
dataset = MultiTaskDataset(
    dirty_features=X_dirty,
    clean_features=X_clean,
    error_mask=errors,
    labels=y
)

# Initialize model
model = UNIDQ(n_features=X_dirty.shape[1])

# Train
trainer = UNIDQTrainer(model)
trainer.fit(dataset)

# Predict
results = model.predict(X_new)
print(f"Detected errors: {results['errors']}")
print(f"Imputed values: {results['imputed']}")

Citation

If you use UNIDQ in your research, please cite:

@inproceedings{unidq2026,
  title={UNIDQ: A Unified Transformer Architecture for Multi-Task Data Quality},
  author={Your Name},
  booktitle={Proceedings of the VLDB Endowment},
  year={2026}
}

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unidq-0.1.0.tar.gz (15.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unidq-0.1.0-py3-none-any.whl (13.1 kB view details)

Uploaded Python 3

File details

Details for the file unidq-0.1.0.tar.gz.

File metadata

  • Download URL: unidq-0.1.0.tar.gz
  • Upload date:
  • Size: 15.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for unidq-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a4e1723169a3f21e57b898993a2f5e5b0897efc4f379b7abea9958a983d6c3c1
MD5 217aa705bd62adb4f30a242ba4663687
BLAKE2b-256 2dccc921a7ea768688fa6f31d43bc02c6b13d53d897f8932e345f646fab974b2

See more details on using hashes here.

File details

Details for the file unidq-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: unidq-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for unidq-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 07e82d020b2c54bb20ba38c90dffc01a368ba58a410007fac02250f0bb428706
MD5 1cd48056d6852db5c7edd3503c726310
BLAKE2b-256 8e79c3293f6d6025532e791f85a92f8e70daff5a72baf655d5b633be4764f8fd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page