Skip to main content

Pure Python parser for Data Quality Language (DQL)

Project description

dql-parser

CI PyPI Python License

Pure Python parser for Data Quality Language (DQL) - a human-readable language for defining data quality expectations.

Documentation | PyPI | GitHub

Features

  • 🚀 Zero Dependencies (except Lark parser)
  • 🎯 Framework-Agnostic - No Django, Flask, or any framework required
  • Fast - Parses 100-line DQL files in <50ms
  • 📝 Clear Error Messages - Line and column information for syntax errors
  • 🐍 Python 3.8+ - Supports Python 3.8 through 3.12

Installation

pip install dql-parser

Quick Start

from dql_parser import DQLParser

# Parse DQL text
parser = DQLParser()
ast = parser.parse("""
FROM Customer
EXPECT column("email") to_not_be_null SEVERITY critical
EXPECT column("age") to_be_between(18, 120)
""")

# Access parsed expectations
for from_block in ast.from_blocks:
    print(f"Model: {from_block.model_name}")
    for expectation in from_block.expectations:
        print(f"  - {expectation.operator}")

DQL Syntax Overview

DQL (Data Quality Language) is a declarative language for defining data quality rules:

FROM ModelName

EXPECT column("field_name") to_not_be_null SEVERITY critical
EXPECT column("email") to_match_pattern("[a-z]+@[a-z]+\\.[a-z]+")
EXPECT column("age") to_be_between(0, 150)
EXPECT column("status") to_be_in("active", "pending", "closed")
EXPECT column("id") to_be_unique

Supported Operators

  • to_be_null - Column must be NULL
  • to_not_be_null - Column must not be NULL
  • to_match_pattern(regex) - Column must match regex pattern
  • to_be_between(min, max) - Column must be between min and max
  • to_be_in(value1, value2, ...) - Column must be one of the values
  • to_be_unique - Column must have unique values

Severity Levels

  • critical - Must pass for validation to succeed
  • warning - Logged but doesn't fail validation
  • info - Informational only

API Reference

DQLParser

Main parser class for DQL syntax.

parser = DQLParser()

parse(text: str) -> DQLFile

Parse DQL text and return AST.

Args:

  • text: DQL source text

Returns:

  • DQLFile: Root AST node

Raises:

  • DQLSyntaxError: If syntax is invalid

parse_file(filepath: str) -> DQLFile

Parse DQL file and return AST.

Args:

  • filepath: Path to .dql file

Returns:

  • DQLFile: Root AST node

Raises:

  • DQLSyntaxError: If syntax is invalid
  • FileNotFoundError: If file doesn't exist

AST Nodes

The parser returns an Abstract Syntax Tree (AST) composed of dataclass nodes:

  • DQLFile - Root node containing from_blocks
  • FromBlock - Represents a FROM block with model_name and expectations
  • ExpectationNode - Single expectation with target, operator, severity
  • ColumnTarget - Column reference
  • RowTarget - Row-level condition
  • Operators: ToBeNull, ToNotBeNull, ToMatchPattern, ToBeBetween, ToBeIn, ToBeUnique

Error Handling

DQL parser provides clear, actionable error messages:

try:
    ast = parser.parse("EXPECT column('email') invalid_operator")
except DQLSyntaxError as e:
    print(e)
    # Output: Syntax error at line 1, column 30: unexpected token 'invalid_operator'

Development

Setup

# Clone repository
git clone https://github.com/dql-project/dql-parser.git
cd dql-parser

# Install in development mode
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=dql_parser --cov-report=html

# Run specific test file
pytest tests/test_valid_syntax.py

Code Quality

# Format code
black dql_parser tests

# Lint code
flake8 dql_parser tests

# Type check
mypy dql_parser

Documentation

Full documentation: https://yourusername.github.io/dql-parser/

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.

Related Projects

Package Selection

Not sure which package to use? See the Package Selection Guide

Changelog

See CHANGELOG.md for version history.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dql_parser-0.1.1.tar.gz (24.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dql_parser-0.1.1-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file dql_parser-0.1.1.tar.gz.

File metadata

  • Download URL: dql_parser-0.1.1.tar.gz
  • Upload date:
  • Size: 24.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for dql_parser-0.1.1.tar.gz
Algorithm Hash digest
SHA256 30f32e2906a21b11bad7ed42acb849cf5f69e482d32bc252fef61f2c58b57a5a
MD5 23bde64afae8fb3635893d46a9ad071e
BLAKE2b-256 9bf37aa299a0b5ef8afccef61c65dad2c4b5c75ca9669917c8e802c9051ee9b3

See more details on using hashes here.

File details

Details for the file dql_parser-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: dql_parser-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for dql_parser-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 44d3ad79935ff2a79e22b816726bc19d26446348281040e5937922fa20ce3f53
MD5 c778286c5434398c2a6aa752e455ad31
BLAKE2b-256 ff4dacba9f5881356ea6e5ccaf026b2f3781879a6fb491bad84c2a3c3c8347ee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page