Python library designed provide pipelining tools dqm-ml library to compute data quality metrics for Machine Learning
Project description
DQM-ML Job
Orchestration engine for DQM-ML V2. Handles data loading, processing, and output writing.
Installation
pip install dqm-ml-job
Note:
dqm-ml-jobhandles data loading and orchestration. To compute metrics, you also need at least one of:dqm-ml-core,dqm-ml-images, ordqm-ml-pytorch(see Dependencies below).
Quick Start
Using Python
from dqm_ml_job.cli import execute
# Execute a data quality job from a YAML config
execute(["-p", "config.yaml"])
Using Python Module
python -m dqm_ml_job.cli -p config.yaml
Example config.yaml:
dataloaders:
my_data:
type: parquet
path: data/train.parquet
metrics_processor:
completeness:
type: completeness
input_columns: [col_a, col_b]
Dependencies
DQM-ML is modular — dqm-ml-job provides the orchestration, but you need additional packages to compute actual metrics:
# For Completeness and Representativeness
pip install dqm-ml-job dqm-ml-core
# For Visual Features
pip install dqm-ml-job dqm-ml-images
# For Domain Gap
pip install dqm-ml-job dqm-ml-pytorch
# All metrics
pip install dqm-ml-job dqm-ml-core dqm-ml-images dqm-ml-pytorch
Key Components
DatasetPipeline
The main orchestrator that:
- Loads the configuration
- Discovers plugins via entry points
- Executes the streaming loop
- Manages memory and I/O efficiency
Protocols
| Protocol | Description |
|---|---|
| DataLoader | Factory for creating data selections (e.g., Parquet, CSV loaders) |
| DataSelection | Represents a specific subset of data and provides an iterator over batches |
| OutputWriter | Persists computed features or metrics to disk |
Built-in Loaders
| Loader | Description |
|---|---|
| parquet | Optimized loading using PyArrow |
| csv | Flexible loading using Pandas |
See Also
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dqm_ml_job-2.0.0rc0.tar.gz.
File metadata
- Download URL: dqm_ml_job-2.0.0rc0.tar.gz
- Upload date:
- Size: 14.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"26.04","id":"resolute","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73b0d54d52a6369f67b05c131d0f36b3dc0752909b620032d8e92907985989b7
|
|
| MD5 |
c9ebdbab7ae505d9f6429013fbc9cce0
|
|
| BLAKE2b-256 |
a4d93a2c7f9ffa0b551e2db27c24ceb54216c797d2f5308ec71b9b944008d504
|
File details
Details for the file dqm_ml_job-2.0.0rc0-py3-none-any.whl.
File metadata
- Download URL: dqm_ml_job-2.0.0rc0-py3-none-any.whl
- Upload date:
- Size: 17.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"26.04","id":"resolute","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
552aba9c9fe9aff6b623d6cfc9643e3a10b2cf97fd7c50f13e64a95b5c02688f
|
|
| MD5 |
3bce01bb7467dcdf100fae9f3283cafa
|
|
| BLAKE2b-256 |
13eef29b6d6fb9d6ddfd7e5b73f53289a3c6d362e79fbd17c2c3890ca76381e3
|