Skip to main content

Pure Python parser for Data Quality Language (DQL)

Project description

dql-parser

CI PyPI Python License

Pure Python parser for Data Quality Language (DQL) - a human-readable language for defining data quality expectations.

Documentation | PyPI | GitHub

Features

  • 🚀 Zero Dependencies (except Lark parser)
  • 🎯 Framework-Agnostic - No Django, Flask, or any framework required
  • Fast - Parses 100-line DQL files in <50ms
  • 📝 Clear Error Messages - Line and column information for syntax errors
  • 🐍 Python 3.8+ - Supports Python 3.8 through 3.12

Installation

pip install dql-parser

Quick Start

from dql_parser import DQLParser

# Parse DQL text
parser = DQLParser()
ast = parser.parse("""
FROM Customer
EXPECT column("email") to_not_be_null SEVERITY critical
EXPECT column("age") to_be_between(18, 120)
""")

# Access parsed expectations
for from_block in ast.from_blocks:
    print(f"Model: {from_block.model_name}")
    for expectation in from_block.expectations:
        print(f"  - {expectation.operator}")

DQL Syntax Overview

DQL (Data Quality Language) is a declarative language for defining data quality rules:

FROM ModelName

EXPECT column("field_name") to_not_be_null SEVERITY critical
EXPECT column("email") to_match_pattern("[a-z]+@[a-z]+\\.[a-z]+")
EXPECT column("age") to_be_between(0, 150)
EXPECT column("status") to_be_in("active", "pending", "closed")
EXPECT column("id") to_be_unique

Supported Operators

  • to_be_null - Column must be NULL
  • to_not_be_null - Column must not be NULL
  • to_match_pattern(regex) - Column must match regex pattern
  • to_be_between(min, max) - Column must be between min and max
  • to_be_in(value1, value2, ...) - Column must be one of the values
  • to_be_unique - Column must have unique values

Severity Levels

  • critical - Must pass for validation to succeed
  • warning - Logged but doesn't fail validation
  • info - Informational only

API Reference

DQLParser

Main parser class for DQL syntax.

parser = DQLParser()

parse(text: str) -> DQLFile

Parse DQL text and return AST.

Args:

  • text: DQL source text

Returns:

  • DQLFile: Root AST node

Raises:

  • DQLSyntaxError: If syntax is invalid

parse_file(filepath: str) -> DQLFile

Parse DQL file and return AST.

Args:

  • filepath: Path to .dql file

Returns:

  • DQLFile: Root AST node

Raises:

  • DQLSyntaxError: If syntax is invalid
  • FileNotFoundError: If file doesn't exist

AST Nodes

The parser returns an Abstract Syntax Tree (AST) composed of dataclass nodes:

  • DQLFile - Root node containing from_blocks
  • FromBlock - Represents a FROM block with model_name and expectations
  • ExpectationNode - Single expectation with target, operator, severity
  • ColumnTarget - Column reference
  • RowTarget - Row-level condition
  • Operators: ToBeNull, ToNotBeNull, ToMatchPattern, ToBeBetween, ToBeIn, ToBeUnique

Error Handling

DQL parser provides clear, actionable error messages:

try:
    ast = parser.parse("EXPECT column('email') invalid_operator")
except DQLSyntaxError as e:
    print(e)
    # Output: Syntax error at line 1, column 30: unexpected token 'invalid_operator'

Development

Setup

# Clone repository
git clone https://github.com/dql-project/dql-parser.git
cd dql-parser

# Install in development mode
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=dql_parser --cov-report=html

# Run specific test file
pytest tests/test_valid_syntax.py

Code Quality

# Format code
black dql_parser tests

# Lint code
flake8 dql_parser tests

# Type check
mypy dql_parser

Documentation

Full documentation: https://yourusername.github.io/dql-parser/

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.

Related Projects

Package Selection

Not sure which package to use? See the Package Selection Guide

Changelog

See CHANGELOG.md for version history.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dql_parser-0.1.0.tar.gz (23.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dql_parser-0.1.0-py3-none-any.whl (15.3 kB view details)

Uploaded Python 3

File details

Details for the file dql_parser-0.1.0.tar.gz.

File metadata

  • Download URL: dql_parser-0.1.0.tar.gz
  • Upload date:
  • Size: 23.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for dql_parser-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e2a02a44a02fd5d907f052d33a74aa902add3976e46c4f5a80d31c1274f74172
MD5 f77b2cd592b11934d438081e5a4b3998
BLAKE2b-256 f776180b8205ebf7a7de4d6fd4a343947b2e750f37c515661d4e41707c2193dc

See more details on using hashes here.

File details

Details for the file dql_parser-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dql_parser-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for dql_parser-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b5a80de1a1870aa282991b5ab53428e2bebfdba2fe993f8cb948fba550c45803
MD5 6aa4354775dbd5381bb5a880d462bbad
BLAKE2b-256 7a2f22876b144fbc547cb57c193ee436c7f22e98cc57d28a4bbc86d9ffd51c31

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page