Skip to main content

Production-ready validation framework for OHLCV financial data

Project description

DQF - Data Quality Framework

Tests Version Python License

MIF-certified data quality framework for OHLCV financial data.

DQF v1.2 validates financial time series through CORE and ADVISORY checks, produces MIF-Lite manifests with a cryptographic MIF-UID and a MIF Purity Index (MPI), and supports two operational modes: CERTIFICATION (strict, deterministic) and DIAGNOSTIC (advisory, flexible).


๐ŸŽฏ Why DQF Exists

The Fundamental Problem

Garbage In, Garbage Out (GIGO): No matter how sophisticated your trading algorithms or statistical models, if your input data is corrupted, all results are invalid.

Statistical Reality:

  • 80% of quantitative strategies fail in production not because of flawed logic, but because of corrupted data during development
  • Data quality issues are detected on average 6 months after deployment
  • A single corrupted data point can invalidate months of backtesting

The Philosophy of Purification

DQF embodies the principle of systematic purification before critical operations:

Historical Precedents:

  • Medicine: Hand washing before surgery (Semmelweis, 1847) - reduced mortality from 18% to 2%
  • Laboratory Science: Sterile technique before experiments - ensures reproducibility
  • Software Engineering: Input validation before processing - prevents crashes
  • Quantitative Finance: DQF (data purification) before analysis - guarantees validity

Cultural Parallels (methodological, not spiritual):

  • Islam: Wuแธลซ (ablution) - 7 ritual cleansings before Salat (prayer)
  • Shinto: Temizuya (water purification) before entering a shrine
  • Laboratory: Autoclave sterilization before cell culture
  • DQF: 7 systematic checks before quantitative analysis

Core Principle: Without purification, the critical operation (analysis/trading) produces unreliable results.


โœจ What DQF Does

Dual Mission

1. Validation: Detect and report data quality issues

  • Identifies violations of market physics (H<L, negative volume, etc.)
  • Detects statistical anomalies (extreme returns, forward-fill abuse)
  • Validates structural integrity (timezone, calendar, duplicates)

2. Purification: Generate certified clean datasets

  • Produces validated DataFrames with full provenance tracking
  • Guarantees reproducibility (same data โ†’ same results, always)
  • Enables consistent analysis across teams and time

The DQF Guarantee

When DQF reports status: PASS:

  • โœ… Data respects market physics laws
  • โœ… No statistical anomalies detected
  • โœ… Complete provenance chain tracked
  • โœ… Dataset certified for production use

This is not just validation - it's data certification.


๐Ÿ”ฌ Core Benefits

For Quantitative Researchers

Problem: Corrupted data during backtesting โ†’ false conclusions

# Without DQF: Unknown data quality
backtest_results = strategy.run(data)  # ๐Ÿ’ฅ May be invalid
paper.publish(backtest_results)        # ๐Ÿ’ฅ Non-reproducible

Solution: Certified clean data โ†’ reliable backtests

# With DQF: Certified data quality
report = validator.validate(data)
if report.overall_status == "PASS":
    backtest_results = strategy.run(report.cleaned_data)  # โœ… Valid
    paper.publish(backtest_results)                       # โœ… Reproducible

Benefits:

  • โœ… Reproducible research (same data โ†’ same results)
  • โœ… Peer review confidence (provenance tracking)
  • โœ… Publication credibility (certified datasets)

For Trading Systems

Problem: Data corruption in production โ†’ catastrophic losses

# Without DQF: Unknown data quality
live_data = fetch_latest()
signal = model.predict(live_data)  # ๐Ÿ’ฅ May be based on corrupted data
execute_trade(signal)              # ๐Ÿ’ฅ Potential disaster

Solution: Real-time validation โ†’ safe trading

# With DQF: Real-time validation
live_data = fetch_latest()
report = validator.validate(live_data)

if report.overall_status == "PASS":
    signal = model.predict(report.cleaned_data)  # โœ… Safe
    execute_trade(signal)                        # โœ… Confident
else:
    alert_team(report.all_issues)  # ๐Ÿšจ Data quality issue
    halt_trading()                 # Safety first

Benefits:

  • โœ… Risk mitigation (detect issues before trading)
  • โœ… Regulatory compliance (audit trail)
  • โœ… Post-mortem analysis (provenance tracking)

For Data Engineers

Problem: Silent data corruption in pipelines

# Without DQF: Silent failures
raw_data = extract_from_source()
transformed = apply_transformations(raw_data)  # ๐Ÿ’ฅ May propagate corruption
load_to_warehouse(transformed)                 # ๐Ÿ’ฅ Garbage persisted

Solution: Validation checkpoints โ†’ data integrity

# With DQF: Validated pipeline
raw_data = extract_from_source()

# Checkpoint 1: Validate raw data
raw_report = validator.validate(raw_data)
assert raw_report.overall_status == "PASS"

transformed = apply_transformations(raw_report.cleaned_data)

# Checkpoint 2: Validate transformed data
final_report = validator.validate(transformed)
assert final_report.overall_status == "PASS"

load_to_warehouse(final_report.cleaned_data)  # โœ… Only clean data persisted

Benefits:

  • โœ… Early detection (issues caught immediately)
  • โœ… Data lineage (full provenance chain)
  • โœ… Quality metrics (SLA monitoring)

๐Ÿš€ Quick Start

Installation

pip install mif-dqf

Note โ€” package name vs import name: the PyPI package is mif-dqf but the Python import is from dqf import ... (not import mif_dqf). This is intentional: dqf is the canonical module namespace.

# Install
# pip install mif-dqf

# Import (module name is 'dqf', not 'mif_dqf')
from dqf import DQFValidator, DQFConfig, DQFMode

Basic Usage

import pandas as pd
from dqf import DQFValidator, DQFConfig, DQFMode

# Load your data (timezone-aware index required)
data = pd.read_csv("spy.csv", index_col=0, parse_dates=True)
data.index = data.index.tz_localize("UTC")

# CERTIFICATION mode โ€” strict, deterministic, calendar required
config = DQFConfig(mode=DQFMode.CERTIFICATION)
validator = DQFValidator(config)
report = validator.validate(data, calendar="NYSE")

if report.is_certified:
    print(f"โœ… CERTIFIED  MPI={report.purity_index:.1f}/100  gate={report.precondition_gate}")
    print(f"   UID: {report.mif_uid}")
    clean_data = report.cleaned_data   # validated DataFrame
    report.print_summary()             # human-readable summary
else:
    print(f"โŒ {report.overall_status}  gate={report.precondition_gate}")
    print(f"   CORE:     {report.core_results}")
    print(f"   ADVISORY: {report.advisory_results}")

DIAGNOSTIC mode (no calendar required, useful for exploration):

config = DQFConfig(mode=DQFMode.DIAGNOSTIC)
report = DQFValidator(config).validate(data)
print(f"Status: {report.overall_status}  MPI: {report.purity_index:.1f}")

v1.2 โ€” Cleaning Log (record every intervention in Parquet):

# enable_cleaning_log=True captures per-row intervention detail
report = validator.validate(df, calendar='NYSE', enable_cleaning_log=True)
print(report.has_cleaning_log)         # False if data was clean
df_log = report.get_cleaning_log_df()  # None or DataFrame with columns:
                                        # row_index, check_id, intervention,
                                        # field, value_before, value_after, gravity

Output (CERTIFICATION, clean data):

โœ… CERTIFIED  MPI=100.0/100  gate=1.0
   UID: sha256:a3f9...

๐Ÿ“‹ DQF v1.2 Checks

CORE checks โ€” failure โ†’ STATUS_VOID, precondition_gate = 0.0

ID Check Purpose
PROD Envelope seal Output trust mechanism โ€” always injected PASS
C2 OHLCV Integrity Market physics (Hโ‰ฅL, Hโ‰ฅO/C, Vโ‰ฅ0, no NaN)
C3 Calendar Alignment Declared calendar required in CERTIFICATION mode
C5 Index Traceability Unique, chronological, timezone-aware index

ADVISORY checks โ€” warn โ†’ STATUS_WARNING, gate capped by MPI

ID Check Purpose
C1 Source Uniqueness Single canonical source (SKIP in Phase 1 โ€” DAL pending)
C4 Forward-Fill Detection Detects interpolation abuse (consecutive repeats)

Removed in v1.1: C6 (Sanity Tests) migrated to MIF Layer 1; C7 (Logging) replaced by PROD envelope.

Active Cleaning Log (v1.2)

When enable_cleaning_log=True, every intervention detected by C2, C3, and C4 is recorded in a Parquet log embedded in the manifest.

Property / Method Description
report.has_cleaning_log True when at least one intervention was logged
report.get_cleaning_log_df() Returns a DataFrame or None
manifest["cleaning_log"] Base64-encoded Parquet bytes
manifest["provenance"]["cleaning_log_uri"] "embedded:sha256:โ€ฆ" or None

The MIF-UID is always computed on raw data โ€” the cleaning log does not affect the cryptographic identity of the dataset.


๐ŸŽ“ Complete Examples

Example 1: Research Workflow

import pandas as pd
from pathlib import Path
from dqf import DQFValidator, DQFConfig, DQFMode

# Research scenario: Certifying historical data for a paper
data = pd.read_csv("spy_2020_2024.csv", index_col=0, parse_dates=True)
data.index = data.index.tz_localize("UTC")

config = DQFConfig(mode=DQFMode.CERTIFICATION)
report = DQFValidator(config).validate(data, calendar="NYSE")

if report.is_certified:
    # Save certified dataset with provenance
    report.cleaned_data.to_csv("spy_2020_2024_certified.csv")
    Path("provenance_spy.json").write_text(report.to_json())

    print(f"โœ… CERTIFIED  MPI={report.purity_index:.1f}/100")
    print(f"   MIF-UID: {report.mif_uid}")
else:
    print(f"โŒ {report.overall_status} (gate={report.precondition_gate})")
    print(f"   CORE failures: {report.core_results}")

Example 2: Production Pipeline

import logging
from dqf import DQFValidator, DQFConfig, DQFMode

logger = logging.getLogger(__name__)

# Shared validator โ€” reuse across calls (thread-safe for validate())
_config    = DQFConfig(mode=DQFMode.CERTIFICATION, c4_warn_threshold=1)
_validator = DQFValidator(_config)

def validate_daily_data(symbol: str, calendar: str, data: pd.DataFrame) -> pd.DataFrame:
    """Certify daily data; raise on VOID."""
    report = _validator.validate(data, calendar=calendar)

    if report.overall_status == "VOID":
        logger.critical("%s: VOID  core=%s", symbol, report.core_results)
        raise ValueError(f"CORE failure for {symbol} โ€” gate=0")

    if report.overall_status == "WARNING":
        logger.warning("%s: WARNING  advisory=%s  MPI=%.1f",
                       symbol, report.advisory_results, report.purity_index)

    logger.info("%s: %s  MPI=%.1f  UID=%s",
                symbol, report.overall_status, report.purity_index, report.mif_uid)
    return report.cleaned_data

# Usage
try:
    clean = validate_daily_data("SPY", "NYSE", raw_data)
    load_to_warehouse(clean)
except ValueError as exc:
    alert_team(str(exc))
    halt_pipeline()

Example 3: Batch Processing

from pathlib import Path
from dqf import DQFValidator, DQFConfig, DQFMode

CALENDAR = {"BTC-USD": "CRYPTO_24_7", "ETH-USD": "CRYPTO_24_7",
            "SPY": "NYSE", "GLD": "NYSE"}

config    = DQFConfig(mode=DQFMode.CERTIFICATION)
validator = DQFValidator(config)
results   = {}

for symbol, calendar in CALENDAR.items():
    data = pd.read_csv(f"{symbol}.csv", index_col=0, parse_dates=True)
    data.index = data.index.tz_localize("UTC")
    results[symbol] = validator.validate(data, calendar=calendar)
    print(f"{symbol}: {results[symbol].overall_status}  MPI={results[symbol].purity_index:.1f}")

# Keep only certified datasets
certified = {s: r.cleaned_data for s, r in results.items() if r.is_certified}
print(f"\n{len(certified)}/{len(CALENDAR)} datasets CERTIFIED")

# Persist manifests
for symbol, report in results.items():
    Path(f"manifests/{symbol}.mif.json").write_text(report.to_json())

Example 4: Custom Check

See examples/04_custom_check.py for a complete example. Custom checks extend BaseCheck and are registered via validator.add_custom_check(). They run as ADVISORY checks (WARN โ†’ STATUS_WARNING, never STATUS_VOID).

from typing import Any
from dqf import DQFValidator, DQFConfig, DQFMode
from dqf.checks.base import BaseCheck, CheckResult

class LiquidityCheck(BaseCheck):
    """Advisory check: minimum daily volume."""

    def __init__(self, min_vol: float = 1_000_000) -> None:
        super().__init__(check_id="C_LIQ", check_name="Minimum Liquidity")
        self.min_vol = min_vol

    def run(self, data, **kwargs: Any) -> CheckResult:
        low = int((data["volume"] < self.min_vol).sum())
        if low:
            return self._create_warning_result(
                message=f"{low} days below minimum volume ({self.min_vol:,.0f})",
                details={"low_volume_days": low},
            )
        return self._create_pass_result(message="Liquidity OK")

config    = DQFConfig(mode=DQFMode.DIAGNOSTIC)
validator = DQFValidator(config)
validator.add_custom_check("C_LIQ", LiquidityCheck(min_vol=500_000))
report    = validator.validate(data)
print(report.advisory_results)  # {'C4': 'PASS', 'C_LIQ': 'PASS', 'C1': 'SKIP'}

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Input: Raw DataFrame (OHLCV)         โ”‚
โ”‚   - Potentially corrupted               โ”‚
โ”‚   - Unknown quality                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   DQFValidator (mode: CERT | DIAG)     โ”‚
โ”‚  CORE checks (failure โ†’ VOID)          โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚ C2. OHLCV Integrity             โ”‚   โ”‚
โ”‚  โ”‚ C3. Calendar Alignment          โ”‚   โ”‚
โ”‚  โ”‚ C5. Index Traceability          โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚  ADVISORY checks (warn โ†’ WARNING)      โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚ C1. Source Uniqueness (SKIP/P1) โ”‚   โ”‚
โ”‚  โ”‚ C4. Forward-Fill Detection      โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   PROD Envelope (MIF-Lite manifest)    โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚ MIF-UID  = SHA-256(hash+ver+cal)โ”‚   โ”‚
โ”‚  โ”‚ MPI      = 100ร—(1โˆ’ฮฃwแตข/N)       โ”‚   โ”‚
โ”‚  โ”‚ gate     = 1.0/0.8/0.0         โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Output: DQFReport (.mif.json)        โ”‚
โ”‚  - overall_status: CERTIFIED/WARNING/  โ”‚
โ”‚                    VOID                โ”‚
โ”‚  - purity_index: 0โ€“100 (MPI)          โ”‚
โ”‚  - precondition_gate: 0.0/0.8/1.0     โ”‚
โ”‚  - mif_uid: sha256:...                 โ”‚
โ”‚  - cleaned_data: validated DataFrame   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Design Principles:

  • Deterministic: Same data + same args โ†’ Same MIF-UID (always)
  • Dual mode: CERTIFICATION (strict) vs DIAGNOSTIC (advisory)
  • MPI: continuous purity score replaces binary PASS/FAIL
  • Production-Ready: 224/224 tests passing

๐Ÿ“– Documentation


๐Ÿงช Testing & Quality

# Run all tests
pytest tests/ -v                    # 224/224 passing

# Coverage
pytest tests/ --cov=dqf

# Examples
python examples/01_basic_validation.py    # โœ… Works
python examples/02_custom_config.py       # โœ… Works
python examples/03_batch_processing.py    # โœ… Works
python examples/04_custom_check.py        # โœ… Works

Quality Metrics:

  • 224 tests (185 unit + 38 integration + 1 root)
  • 0 failures

๐Ÿ“ฆ Project Structure

dqf/
โ”œโ”€โ”€ dqf/                          # Source code
โ”‚   โ”œโ”€โ”€ checks/                  # C1โ€“C5 checks
โ”‚   โ”œโ”€โ”€ core/                    # Config, Validator, Report, PRODEnvelope
โ”‚   โ””โ”€โ”€ utils/                   # Calendar, MPI, CleaningLog
โ”œโ”€โ”€ tests/                       # Test suite (224 tests)
โ”‚   โ”œโ”€โ”€ unit/                    # Per-module unit tests
โ”‚   โ””โ”€โ”€ integration/             # End-to-end pipeline tests
โ”œโ”€โ”€ examples/                    # Complete examples (4)
โ”œโ”€โ”€ docs/
โ”‚   โ”œโ”€โ”€ DQF_SPECIFICATION.md     # Canonical specification (v1.1)
โ”‚   โ”œโ”€โ”€ API.md                   # API reference
โ”‚   โ”œโ”€โ”€ ARCHITECTURE.md          # Design & patterns
โ”‚   โ””โ”€โ”€ TROUBLESHOOTING.md       # Common issues
โ”œโ”€โ”€ scripts/
โ”‚   โ””โ”€โ”€ test_install.py          # Installation smoke test
โ”œโ”€โ”€ pyproject.toml               # Package metadata
โ””โ”€โ”€ LICENSE                      # MIT License

๐Ÿ› ๏ธ Development

Requirements

  • Python 3.10+
  • pandas >= 2.0.0
  • PyYAML >= 6.0

Setup

# Clone repository
git clone https://github.com/symbioticode/mif-dqf.git
cd mif-dqf

# Install in editable mode
pip install -e .

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.


๐Ÿ“Š Benchmarks

Performance (100 days of data):

Total validation time: ~0.6s
  - C2 (Integrity):    0.32s
  - C3 (Calendar):     0.10s
  - C4 (Ffill):        0.10s
  - C5 (Index):        0.08s
  - PROD Envelope:     <0.01s

Scalability:

  • 100 days: ~0.6s
  • 1,000 days: ~2.0s
  • 10,000 days: ~15s

๐Ÿ—บ๏ธ Roadmap

v1.0.0 (legacy)

  • 7 checks (Source, Integrity, Calendar, Forward-Fill, Index, Sanity, Logging)
  • Binary PASS/FAIL report

v1.1.0 โœ…

  • โœ… Two operational modes: CERTIFICATION (strict) vs DIAGNOSTIC (advisory)
  • โœ… CORE/ADVISORY check classification โ€” CORE failure โ†’ VOID, gate=0
  • โœ… PROD envelope produces MIF-Lite manifest (.mif.json)
  • โœ… MIF Purity Index (MPI) โ€” 0โ€“100 continuous purity score
  • โœ… MIF-UID โ€” SHA-256(data_hash + dqf_version + calendar + mode)
  • โœ… C6 (Sanity) migrated to MIF Layer 1; C7 (Logging) replaced by PROD envelope

v1.2.0 โ€” Active Cleaning โœ… (current)

  • โœ… Optional cleaning log via enable_cleaning_log=True
  • โœ… Parquet log with per-intervention detail (row_index, check_id, field, gravityโ€ฆ)
  • โœ… report.has_cleaning_log / report.get_cleaning_log_df()
  • โœ… MIF-UID computed on raw data (cleaning log excluded from hash)
  • โœ… validator.add_custom_check(check_id, check) โ€” custom ADVISORY checks
  • โœ… 224/224 tests passing

v2.0.0 โ€” MIF integration (planned)

  • DAL integration (get_certified_data())
  • C1 (Source Uniqueness) activated โ€” DAL handoff
  • Full provenance chain: source โ†’ DQF โ†’ MIF

๐Ÿค Ecosystem

DQF is the foundational layer of the MIF (Metric Integrity Framework) ecosystem.

MIF Layers 1โ€“5  = Metric certification & strategy validation
       โ†‘           (score capped if DQF precondition fails)
     DAL         = Multi-source data abstraction [planned]
       โ†‘
     DQF         = Data quality gate [YOU ARE HERE]
       โ†‘
Raw Sources     = Yahoo Finance, Binance, Kraken, etc.

DQF acts as a precondition_gate: if data does not pass DQF, downstream MIF scores are bounded regardless of metric quality. See DQF_SPECIFICATION.md for the full integration contract.


๐Ÿ“„ License

MIT License - see LICENSE file for details.


๐Ÿ™ Acknowledgments

  • Methodology: Systematic purification as scientific hygiene
  • Inspiration: Medical sterilization, laboratory protocols
  • Cultural parallels: Islamic Wudu, Shinto Temizuya (ritual, not spiritual)
  • Tools: pandas, pytest, PyYAML

๐Ÿ“ž Contact & Support


โญ Star History

If DQF helps your research or trading, please consider giving it a star! โญ


Made with rigor by the DQF Team

"Data hygiene is not optional. It's the foundation of reliable quantitative analysis."

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mif_dqf-1.2.0.post1.tar.gz (43.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mif_dqf-1.2.0.post1-py3-none-any.whl (43.2 kB view details)

Uploaded Python 3

File details

Details for the file mif_dqf-1.2.0.post1.tar.gz.

File metadata

  • Download URL: mif_dqf-1.2.0.post1.tar.gz
  • Upload date:
  • Size: 43.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for mif_dqf-1.2.0.post1.tar.gz
Algorithm Hash digest
SHA256 1ce16f69df2374c78ceecc09e40a67cf01eec16f0aa7ac47ff42a449192c1f90
MD5 92acc7e79b9b3f8958c699601bfcb20f
BLAKE2b-256 fa3c48f463175fa05793203f99e239cbe5eb118543bba5dba990f15fc0a71d7b

See more details on using hashes here.

Provenance

The following attestation bundles were made for mif_dqf-1.2.0.post1.tar.gz:

Publisher: mif-dqf-publish.yml on symbioticode/mif-dqf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mif_dqf-1.2.0.post1-py3-none-any.whl.

File metadata

  • Download URL: mif_dqf-1.2.0.post1-py3-none-any.whl
  • Upload date:
  • Size: 43.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for mif_dqf-1.2.0.post1-py3-none-any.whl
Algorithm Hash digest
SHA256 de7fee0797faab69c65a47c6cdc75d56200a6d8498c4ccb7d5e5cf0cb579e616
MD5 c1a9ef1803a6a2ed6105e45dde0e7972
BLAKE2b-256 f52535149e7f5ab79e0159823e3b1a577b6de3e658d4ff5965f7aed618e4c201

See more details on using hashes here.

Provenance

The following attestation bundles were made for mif_dqf-1.2.0.post1-py3-none-any.whl:

Publisher: mif-dqf-publish.yml on symbioticode/mif-dqf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page