Skip to main content

Python library designed provide pipelining tools dqm-ml library to compute data quality metrics for Machine Learning

Project description

DQM-ML Job

Orchestration engine for DQM-ML V2. Handles data loading, processing, and output writing.

Installation

pip install dqm-ml-job

Note: dqm-ml-job handles data loading and orchestration. To compute metrics, you also need at least one of: dqm-ml-core, dqm-ml-images, or dqm-ml-pytorch (see Dependencies below).

Quick Start

Using Python

from dqm_ml_job.cli import execute

# Execute a data quality job from a YAML config
execute(["-p", "config.yaml"])

Using Python Module

python -m dqm_ml_job.cli -p config.yaml

Example config.yaml:

dataloaders:
  my_data:
    type: parquet
    path: data/train.parquet

metrics_processor:
  completeness:
    type: completeness
    input_columns: [col_a, col_b]

Dependencies

DQM-ML is modular — dqm-ml-job provides the orchestration, but you need additional packages to compute actual metrics:

# For Completeness and Representativeness
pip install dqm-ml-job dqm-ml-core

# For Visual Features
pip install dqm-ml-job dqm-ml-images

# For Domain Gap
pip install dqm-ml-job dqm-ml-pytorch

# All metrics
pip install dqm-ml-job dqm-ml-core dqm-ml-images dqm-ml-pytorch

Key Components

DatasetPipeline

The main orchestrator that:

  • Loads the configuration
  • Discovers plugins via entry points
  • Executes the streaming loop
  • Manages memory and I/O efficiency

Protocols

Protocol Description
DataLoader Factory for creating data selections (e.g., Parquet, CSV loaders)
DataSelection Represents a specific subset of data and provides an iterator over batches
OutputWriter Persists computed features or metrics to disk

Built-in Loaders

Loader Description
parquet Optimized loading using PyArrow
csv Flexible loading using Pandas

See Also

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dqm_ml_job-2.0.0rc0.tar.gz (14.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dqm_ml_job-2.0.0rc0-py3-none-any.whl (17.1 kB view details)

Uploaded Python 3

File details

Details for the file dqm_ml_job-2.0.0rc0.tar.gz.

File metadata

  • Download URL: dqm_ml_job-2.0.0rc0.tar.gz
  • Upload date:
  • Size: 14.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"26.04","id":"resolute","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dqm_ml_job-2.0.0rc0.tar.gz
Algorithm Hash digest
SHA256 73b0d54d52a6369f67b05c131d0f36b3dc0752909b620032d8e92907985989b7
MD5 c9ebdbab7ae505d9f6429013fbc9cce0
BLAKE2b-256 a4d93a2c7f9ffa0b551e2db27c24ceb54216c797d2f5308ec71b9b944008d504

See more details on using hashes here.

File details

Details for the file dqm_ml_job-2.0.0rc0-py3-none-any.whl.

File metadata

  • Download URL: dqm_ml_job-2.0.0rc0-py3-none-any.whl
  • Upload date:
  • Size: 17.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"26.04","id":"resolute","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dqm_ml_job-2.0.0rc0-py3-none-any.whl
Algorithm Hash digest
SHA256 552aba9c9fe9aff6b623d6cfc9643e3a10b2cf97fd7c50f13e64a95b5c02688f
MD5 3bce01bb7467dcdf100fae9f3283cafa
BLAKE2b-256 13eef29b6d6fb9d6ddfd7e5b73f53289a3c6d362e79fbd17c2c3890ca76381e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page