Pure Python parser for Data Quality Language (DQL)
Project description
dql-parser
Pure Python parser for Data Quality Language (DQL) - a human-readable language for defining data quality expectations.
Documentation | PyPI | GitHub
Features
- 🚀 Zero Dependencies (except Lark parser)
- 🎯 Framework-Agnostic - No Django, Flask, or any framework required
- ⚡ Fast - Parses 100-line DQL files in <50ms
- 📝 Clear Error Messages - Line and column information for syntax errors
- 🐍 Python 3.8+ - Supports Python 3.8 through 3.12
Installation
pip install dql-parser
Quick Start
from dql_parser import DQLParser
# Parse DQL text
parser = DQLParser()
ast = parser.parse("""
FROM Customer
EXPECT column("email") to_not_be_null SEVERITY critical
EXPECT column("age") to_be_between(18, 120)
""")
# Access parsed expectations
for from_block in ast.from_blocks:
print(f"Model: {from_block.model_name}")
for expectation in from_block.expectations:
print(f" - {expectation.operator}")
DQL Syntax Overview
DQL (Data Quality Language) is a declarative language for defining data quality rules:
FROM ModelName
EXPECT column("field_name") to_not_be_null SEVERITY critical
EXPECT column("email") to_match_pattern("[a-z]+@[a-z]+\\.[a-z]+")
EXPECT column("age") to_be_between(0, 150)
EXPECT column("status") to_be_in("active", "pending", "closed")
EXPECT column("id") to_be_unique
Supported Operators
to_be_null- Column must be NULLto_not_be_null- Column must not be NULLto_match_pattern(regex)- Column must match regex patternto_be_between(min, max)- Column must be between min and maxto_be_in(value1, value2, ...)- Column must be one of the valuesto_be_unique- Column must have unique values
Severity Levels
critical- Must pass for validation to succeedwarning- Logged but doesn't fail validationinfo- Informational only
API Reference
DQLParser
Main parser class for DQL syntax.
parser = DQLParser()
parse(text: str) -> DQLFile
Parse DQL text and return AST.
Args:
text: DQL source text
Returns:
DQLFile: Root AST node
Raises:
DQLSyntaxError: If syntax is invalid
parse_file(filepath: str) -> DQLFile
Parse DQL file and return AST.
Args:
filepath: Path to .dql file
Returns:
DQLFile: Root AST node
Raises:
DQLSyntaxError: If syntax is invalidFileNotFoundError: If file doesn't exist
AST Nodes
The parser returns an Abstract Syntax Tree (AST) composed of dataclass nodes:
DQLFile- Root node containingfrom_blocksFromBlock- Represents a FROM block withmodel_nameandexpectationsExpectationNode- Single expectation withtarget,operator,severityColumnTarget- Column referenceRowTarget- Row-level condition- Operators:
ToBeNull,ToNotBeNull,ToMatchPattern,ToBeBetween,ToBeIn,ToBeUnique
Error Handling
DQL parser provides clear, actionable error messages:
try:
ast = parser.parse("EXPECT column('email') invalid_operator")
except DQLSyntaxError as e:
print(e)
# Output: Syntax error at line 1, column 30: unexpected token 'invalid_operator'
Development
Setup
# Clone repository
git clone https://github.com/dql-project/dql-parser.git
cd dql-parser
# Install in development mode
pip install -e ".[dev]"
# Install pre-commit hooks
pre-commit install
Running Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=dql_parser --cov-report=html
# Run specific test file
pytest tests/test_valid_syntax.py
Code Quality
# Format code
black dql_parser tests
# Lint code
flake8 dql_parser tests
# Type check
mypy dql_parser
Documentation
Full documentation: https://yourusername.github.io/dql-parser/
- Grammar Reference - Complete DQL syntax specification
- AST Reference - AST node documentation
- Examples - Usage examples
- API Reference - Complete API
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
License
MIT License - see LICENSE for details.
Related Projects
- dql-core - Framework-agnostic validation engine (docs)
- django-dqm - Django integration with Admin dashboard (docs)
Package Selection
Not sure which package to use? See the Package Selection Guide
Changelog
See CHANGELOG.md for version history.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dql_parser-0.5.2.tar.gz.
File metadata
- Download URL: dql_parser-0.5.2.tar.gz
- Upload date:
- Size: 71.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3fcc0dbd48639a8b401eb93e1d1f57b6064d920db2c116691538176b7b61c31a
|
|
| MD5 |
9b90504e07259c156b762ddfcfbf9890
|
|
| BLAKE2b-256 |
faf5b46fefe9083f42f06e0b05cfbd08b4c6760bf9ed2b663e2d3772590f3812
|
File details
Details for the file dql_parser-0.5.2-py3-none-any.whl.
File metadata
- Download URL: dql_parser-0.5.2-py3-none-any.whl
- Upload date:
- Size: 42.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
14885b3bf9ab094eb0be9ed9dcdc49d884beac94a9992da09a63270c588820a8
|
|
| MD5 |
98fd0aa8827b65926204158f89b9bd10
|
|
| BLAKE2b-256 |
da5900d04f8136173b53b2ea219857a0ce4fc4ae76625e1de2df5836d5fdc91e
|