Network checks for the DQX data quality framework
Project description
DQX Network Checks
A comprehensive extension to the Databricks Data Quality Framework (DQX) that provides specialized data quality checks for network-related data, including IPv4 addresses, CIDR networks, and network validation operations. It provides row-level validation rules that can be applied to DataFrame columns containing IP addresses, network ranges, and other network-related information.
Quick Start
Basic Usage
Use the network checks inside DQRule classes and apply them to your dataframes
from databricks.labs.dqx.engine import DQEngine
from databricks.labs.dqx.rule import DQRowRule
from databricks.sdk import WorkspaceClient
from dqx_network_checks import is_ipv4_address, is_ipv4_private_address
# Create sample data
input_df = spark.createDataFrame([
("192.168.1.1",),
("10.0.0.1",),
("invalid ip",),
("8.8.8.8",),
], "ip STRING")
# Define data quality checks
checks = [
DQRowRule(criticality="error", check_func=is_ipv4_address, column="ip"),
DQRowRule(criticality="warning", check_func=is_ipv4_private_address, column="ip"),
]
# Apply checks using DQX engine
dq_engine = DQEngine(WorkspaceClient())
valid_df, quarantine_df = dq_engine.apply_checks_and_split(input_df, checks)
Or use the YAML syntax to define and apply your (network)rules
from databricks.labs.dqx.engine import DQEngine
from databricks.sdk import WorkspaceClient
from dqx_network_checks import get_network_checks
# Create sample data
input_df = spark.createDataFrame([
("192.168.1.1",),
("10.0.0.1",),
("invalid ip",),
("8.8.8.8",),
], "ip STRING")
# Define data quality checks
custom_checks = get_network_checks()
checks = yaml.safe_load("""
- criticality: error
check:
function: is_ipv4_address
arguments:
column: ip
""")
# Apply checks using DQX engine
dq_engine = DQEngine(WorkspaceClient())
valid_df, quarantine_df = dq_engine.apply_checks_by_metadata_and_split(
input_df, checks, custom_checks
)
Available Checks
Address Type Validation
| Check Function | Description | Example Valid Values |
|---|---|---|
is_ipv4_address |
Validates IPv4 address format | "192.168.1.1", "10.0.0.1" |
is_ipv4_loopback_address |
Loopback addresses (127.0.0.0/8) | "127.0.0.1", "127.255.255.255" |
is_ipv4_multicast_address |
Multicast addresses (224.0.0.0/4) | "224.0.0.1", "239.255.255.255" |
is_ipv4_private_address |
Private network addresses | "192.168.1.1", "10.0.0.1", "172.16.0.1" |
is_ipv4_global_address |
Public/global addresses | "8.8.8.8", "1.1.1.1" |
Network Operations
| Check Function | Description | Example Valid Values |
|---|---|---|
is_ipv4_network |
Validates CIDR network notation | "192.168.1.0/24", "10.0.0.0/8" |
is_ipv4_network_contains_address |
Checks if IP is in network range | ("192.168.1.1", "192.168.1.0/24") |
Performance Considerations
- All checks use PySpark UDFs for distributed processing
- Network validation is performed using Python's built-in
ipaddressmodule - Checks are optimized for large-scale data processing in Databricks
Acknowledgments
- Built on the Databricks Data Quality Framework (DQX)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dqx_network_checks-0.1.0.tar.gz.
File metadata
- Download URL: dqx_network_checks-0.1.0.tar.gz
- Upload date:
- Size: 40.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8655f4d7b9bbe4fef65fdc55998fcd679ba59de85daa19aeba23f4cc9417263
|
|
| MD5 |
d511f47a91a31f012458c857a8c11a89
|
|
| BLAKE2b-256 |
0817d67031e90e2a907ce40edefcf78b80047487ee60eda16765c965b0b28673
|
File details
Details for the file dqx_network_checks-0.1.0-py3-none-any.whl.
File metadata
- Download URL: dqx_network_checks-0.1.0-py3-none-any.whl
- Upload date:
- Size: 6.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb931008171ec5581e52400470cdd9c87ab68f91df3ed8dcdf564455d9ac3bfe
|
|
| MD5 |
46437e78dc8950b9fb2d725bd8a96bd6
|
|
| BLAKE2b-256 |
e92d8a6580c3f97b0902950618de2aaa02fa8cb1c0b503bd1ec3c1b014c186a9
|