Skip to main content

Network checks for the DQX data quality framework

Project description

DQX Network Checks

A comprehensive extension to the Databricks Data Quality Framework (DQX) that provides specialized data quality checks for network-related data, including IPv4 addresses, CIDR networks, and network validation operations. It provides row-level validation rules that can be applied to DataFrame columns containing IP addresses, network ranges, and other network-related information.

Quick Start

Basic Usage

Use the network checks inside DQRule classes and apply them to your dataframes

from databricks.labs.dqx.engine import DQEngine
from databricks.labs.dqx.rule import DQRowRule
from databricks.sdk import WorkspaceClient
from dqx_network_checks import is_ipv4_address, is_ipv4_private_address

# Create sample data
input_df = spark.createDataFrame([
    ("192.168.1.1",),
    ("10.0.0.1",),
    ("invalid ip",),
    ("8.8.8.8",),
], "ip STRING")

# Define data quality checks
checks = [
    DQRowRule(criticality="error", check_func=is_ipv4_address, column="ip"),
    DQRowRule(criticality="warning", check_func=is_ipv4_private_address, column="ip"),
]

# Apply checks using DQX engine
dq_engine = DQEngine(WorkspaceClient())
valid_df, quarantine_df = dq_engine.apply_checks_and_split(input_df, checks)

Or use the YAML syntax to define and apply your (network)rules

from databricks.labs.dqx.engine import DQEngine
from databricks.sdk import WorkspaceClient
from dqx_network_checks import get_network_checks

# Create sample data
input_df = spark.createDataFrame([
    ("192.168.1.1",),
    ("10.0.0.1",),
    ("invalid ip",),
    ("8.8.8.8",),
], "ip STRING")

# Define data quality checks
custom_checks = get_network_checks()
checks = yaml.safe_load("""
- criticality: error
  check:
    function: is_ipv4_address
    arguments:
      column: ip
""")

# Apply checks using DQX engine
dq_engine = DQEngine(WorkspaceClient())
valid_df, quarantine_df = dq_engine.apply_checks_by_metadata_and_split(
    input_df, checks, custom_checks
)

Available Checks

Address Type Validation

Check Function Description Example Valid Values
is_ipv4_address Validates IPv4 address format "192.168.1.1", "10.0.0.1"
is_ipv4_loopback_address Loopback addresses (127.0.0.0/8) "127.0.0.1", "127.255.255.255"
is_ipv4_multicast_address Multicast addresses (224.0.0.0/4) "224.0.0.1", "239.255.255.255"
is_ipv4_private_address Private network addresses "192.168.1.1", "10.0.0.1", "172.16.0.1"
is_ipv4_global_address Public/global addresses "8.8.8.8", "1.1.1.1"

Network Operations

Check Function Description Example Valid Values
is_ipv4_network Validates CIDR network notation "192.168.1.0/24", "10.0.0.0/8"
is_ipv4_network_contains_address Checks if IP is in network range ("192.168.1.1", "192.168.1.0/24")

Performance Considerations

  • All checks use PySpark UDFs for distributed processing
  • Network validation is performed using Python's built-in ipaddress module
  • Checks are optimized for large-scale data processing in Databricks

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dqx_network_checks-0.1.0.tar.gz (40.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dqx_network_checks-0.1.0-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file dqx_network_checks-0.1.0.tar.gz.

File metadata

  • Download URL: dqx_network_checks-0.1.0.tar.gz
  • Upload date:
  • Size: 40.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.9

File hashes

Hashes for dqx_network_checks-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a8655f4d7b9bbe4fef65fdc55998fcd679ba59de85daa19aeba23f4cc9417263
MD5 d511f47a91a31f012458c857a8c11a89
BLAKE2b-256 0817d67031e90e2a907ce40edefcf78b80047487ee60eda16765c965b0b28673

See more details on using hashes here.

File details

Details for the file dqx_network_checks-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for dqx_network_checks-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bb931008171ec5581e52400470cdd9c87ab68f91df3ed8dcdf564455d9ac3bfe
MD5 46437e78dc8950b9fb2d725bd8a96bd6
BLAKE2b-256 e92d8a6580c3f97b0902950618de2aaa02fa8cb1c0b503bd1ec3c1b014c186a9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page