Skip to main content

Python library designed provide core dqml metrics without huge dependencies, as well as common API shared by metrics

Project description

DQM-ML Core

Core package for DQM-ML V2 providing the foundational API and standard metrics for data quality assessment.

Installation

pip install dqm-ml-core

Note: dqm-ml-core provides metric processors only — no CLI or job orchestration. Use directly via Python or with dqm-ml-job for YAML config execution.

Quick Start

Completeness Example

from dqm_ml_core import CompletenessProcessor

processor = CompletenessProcessor(
    name="my_check",
    config={"input_columns": ["col_a", "col_b"]}
)
result = processor.compute({})
print(f"Completeness: {result['overall_completeness']}")

Representativeness Example

from dqm_ml_core import RepresentativenessProcessor
import numpy as np

# Create sample data (e.g., 1000 samples from normal distribution)
data = np.random.randn(1000)

processor = RepresentativenessProcessor(
    name="dist_check",
    config={
        "input_columns": ["feature"],
        "distribution": "normal",
        "metrics": ["chi-square", "kolmogorov-smirnov"],
        "distribution_params": {"mean": 0.0, "std": 1.0}
    }
)

result = processor.compute({})
print(f"Chi-Square p-value: {result['feature_chi-square_pvalue']}")
print(f"KS p-value: {result['feature_kolmogorov-smirnov_pvalue']}")

With dqm-ml-job

For running from a YAML config, install together with dqm-ml-job:

pip install dqm-ml-job dqm-ml-core

Then use this config:

dataloaders:
  train:
    type: parquet
    path: data/train.parquet

metrics_processor:
  completeness:
    type: completeness
    input_columns: [col_a, col_b]
  
  representativeness:
    type: representativeness
    input_columns: [feature_x]
    distribution: "normal"

Key Concepts

DatametricProcessor

The base class for all metrics and feature extractors. It supports a streaming architecture by splitting computation into two phases:

  1. Batch Level: compute_batch_metric() updates intermediate statistics for a single chunk of data.
  2. Dataset Level: compute() aggregates these statistics into final scores.

Included Metrics

Metric Description
Completeness Analyzes null/missing values in your dataset
Representativeness Statistical distribution analysis (Chi-Square, KS, Shannon Entropy, GRTE)

For Developers

To create a new metric:

  1. Subclass dqm_ml_core.api.data_processor.DatametricProcessor.
  2. Define needed_columns(), generated_features(), and generated_metrics().
  3. Implement the streaming logic in compute_batch_metric() and compute().

Dependencies

DQM-ML is modular. For core metrics:

# Minimal: use as library only
pip install dqm-ml-core

# For YAML config execution
pip install dqm-ml-job dqm-ml-core

# Full stack with all metrics
pip install dqm-ml-job dqm-ml-core dqm-ml-images dqm-ml-pytorch

See Also

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dqm_ml_core-2.0.0rc0.tar.gz (16.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dqm_ml_core-2.0.0rc0-py3-none-any.whl (18.8 kB view details)

Uploaded Python 3

File details

Details for the file dqm_ml_core-2.0.0rc0.tar.gz.

File metadata

  • Download URL: dqm_ml_core-2.0.0rc0.tar.gz
  • Upload date:
  • Size: 16.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"26.04","id":"resolute","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dqm_ml_core-2.0.0rc0.tar.gz
Algorithm Hash digest
SHA256 6950ee214df29f3f629512926e14298fd0bf5cab1987782dac32891c3e5ceb90
MD5 7f673098e41d28e2f00c875b6ec4c324
BLAKE2b-256 610d3d32ffada6dfcce00575996ffddda937f192da133bb15e153781245b2090

See more details on using hashes here.

File details

Details for the file dqm_ml_core-2.0.0rc0-py3-none-any.whl.

File metadata

  • Download URL: dqm_ml_core-2.0.0rc0-py3-none-any.whl
  • Upload date:
  • Size: 18.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"26.04","id":"resolute","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dqm_ml_core-2.0.0rc0-py3-none-any.whl
Algorithm Hash digest
SHA256 675390ec63d2233c19cc583cf071995a89bff7d1bf779f61771795a4fb870fad
MD5 15744c72e88d93e110a9faa801bf855b
BLAKE2b-256 7973057fe244e6fcc474b509dff7e4146645e2eac854033a2086c8f6ad85fd53

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page