Python library designed provide core dqml metrics without huge dependencies, as well as common API shared by metrics
Project description
DQM-ML Core
Core package for DQM-ML V2 providing the foundational API and standard metrics for data quality assessment.
Installation
pip install dqm-ml-core
Note:
dqm-ml-coreprovides metric processors only — no CLI or job orchestration. Use directly via Python or withdqm-ml-jobfor YAML config execution.
Quick Start
Completeness Example
from dqm_ml_core import CompletenessProcessor
processor = CompletenessProcessor(
name="my_check",
config={"input_columns": ["col_a", "col_b"]}
)
result = processor.compute({})
print(f"Completeness: {result['overall_completeness']}")
Representativeness Example
from dqm_ml_core import RepresentativenessProcessor
import numpy as np
# Create sample data (e.g., 1000 samples from normal distribution)
data = np.random.randn(1000)
processor = RepresentativenessProcessor(
name="dist_check",
config={
"input_columns": ["feature"],
"distribution": "normal",
"metrics": ["chi-square", "kolmogorov-smirnov"],
"distribution_params": {"mean": 0.0, "std": 1.0}
}
)
result = processor.compute({})
print(f"Chi-Square p-value: {result['feature_chi-square_pvalue']}")
print(f"KS p-value: {result['feature_kolmogorov-smirnov_pvalue']}")
With dqm-ml-job
For running from a YAML config, install together with dqm-ml-job:
pip install dqm-ml-job dqm-ml-core
Then use this config:
dataloaders:
train:
type: parquet
path: data/train.parquet
metrics_processor:
completeness:
type: completeness
input_columns: [col_a, col_b]
representativeness:
type: representativeness
input_columns: [feature_x]
distribution: "normal"
Key Concepts
DatametricProcessor
The base class for all metrics and feature extractors. It supports a streaming architecture by splitting computation into two phases:
- Batch Level:
compute_batch_metric()updates intermediate statistics for a single chunk of data. - Dataset Level:
compute()aggregates these statistics into final scores.
Included Metrics
| Metric | Description |
|---|---|
| Completeness | Analyzes null/missing values in your dataset |
| Representativeness | Statistical distribution analysis (Chi-Square, KS, Shannon Entropy, GRTE) |
For Developers
To create a new metric:
- Subclass
dqm_ml_core.api.data_processor.DatametricProcessor. - Define
needed_columns(),generated_features(), andgenerated_metrics(). - Implement the streaming logic in
compute_batch_metric()andcompute().
Dependencies
DQM-ML is modular. For core metrics:
# Minimal: use as library only
pip install dqm-ml-core
# For YAML config execution
pip install dqm-ml-job dqm-ml-core
# Full stack with all metrics
pip install dqm-ml-job dqm-ml-core dqm-ml-images dqm-ml-pytorch
See Also
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dqm_ml_core-2.0.0rc0.tar.gz.
File metadata
- Download URL: dqm_ml_core-2.0.0rc0.tar.gz
- Upload date:
- Size: 16.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"26.04","id":"resolute","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6950ee214df29f3f629512926e14298fd0bf5cab1987782dac32891c3e5ceb90
|
|
| MD5 |
7f673098e41d28e2f00c875b6ec4c324
|
|
| BLAKE2b-256 |
610d3d32ffada6dfcce00575996ffddda937f192da133bb15e153781245b2090
|
File details
Details for the file dqm_ml_core-2.0.0rc0-py3-none-any.whl.
File metadata
- Download URL: dqm_ml_core-2.0.0rc0-py3-none-any.whl
- Upload date:
- Size: 18.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"26.04","id":"resolute","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
675390ec63d2233c19cc583cf071995a89bff7d1bf779f61771795a4fb870fad
|
|
| MD5 |
15744c72e88d93e110a9faa801bf855b
|
|
| BLAKE2b-256 |
7973057fe244e6fcc474b509dff7e4146645e2eac854033a2086c8f6ad85fd53
|