Skip to main content

Unified Transformer for Multi-Task Data Quality

Project description

UNIDQ: Unified Data Quality

PyPI version License: MIT Python 3.8+

A unified transformer architecture for multi-task data quality assessment.

UNIDQ addresses 6 data quality tasks with a single model:

  • ✅ Error Detection (F1=0.894, +42% vs Raha)
  • ✅ Data Repair
  • ✅ Missing Value Imputation (R²=0.941, +295% vs MICE)
  • ✅ Label Noise Detection (F1=0.856, +28% vs Cleanlab)
  • ✅ Label Classification
  • ✅ Data Valuation

Installation

pip install unidq

Quick Start

from unidq import UNIDQ, MultiTaskDataset, UNIDQTrainer

# Load your data
dataset = MultiTaskDataset(
    dirty_features=X_dirty,
    clean_features=X_clean,
    error_mask=errors,
    labels=y
)

# Initialize model
model = UNIDQ(n_features=X_dirty.shape[1])

# Train
trainer = UNIDQTrainer(model)
trainer.fit(dataset)

# Predict
results = model.predict(X_new)
print(f"Detected errors: {results['errors']}")
print(f"Imputed values: {results['imputed']}")

Citation

If you use UNIDQ in your research, please cite:

@inproceedings{unidq2026,
  title={UNIDQ: A Unified Transformer Architecture for Multi-Task Data Quality},
  author={shivakoreddi,sravanisowrupilli},
  booktitle={Proceedings of the VLDB Endowment},
  year={2026}
}

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unidq-0.1.1.tar.gz (15.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unidq-0.1.1-py3-none-any.whl (13.1 kB view details)

Uploaded Python 3

File details

Details for the file unidq-0.1.1.tar.gz.

File metadata

  • Download URL: unidq-0.1.1.tar.gz
  • Upload date:
  • Size: 15.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for unidq-0.1.1.tar.gz
Algorithm Hash digest
SHA256 633393977c117a38960932dab9b67456df638af574845bf0d175858b9b331b09
MD5 2aba7e183316b55104d8a08aeac359fc
BLAKE2b-256 555e617f4881a0b841199d70313f10e14c258d54bf9129602b1fb7639ccfeb9d

See more details on using hashes here.

File details

Details for the file unidq-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: unidq-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 13.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for unidq-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 43ccf7dd7949893470dd057d21a574311e0eb337b6434e905d84b14158152364
MD5 9071e25a312e0f431ccedeb3de4bda1e
BLAKE2b-256 f776d6f3a480bfca481572fb5bff4825d211e3036eca99a695fd2e9b1bc0de8e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page