Data profiling and validation engine for modern data warehouses

These details have not been verified by PyPI

Project links

Homepage

Project description

Sparvi Core

Like a hawk keeping watch over your data, Sparvi monitors data pipelines, detects anomalies, tracks schema changes, and ensures data integrity with sharp precision.

Sparvi Core is a Python library for data profiling and validation in modern data warehouses. It helps data engineers and analysts maintain high-quality data by monitoring schema changes, detecting anomalies, and validating data against custom rules.

Features

Data Profiling

Automated Metrics: Compute essential quality metrics (null rates, duplicates, outliers) to understand your data's health at a glance
Schema Analysis: Detect column types, relationships, and constraints
Distribution Analysis: Understand the distribution of values in your data
Historical Comparisons: Compare current profiles with previous runs to detect changes
Anomaly Detection: Automatically detect anomalies in your data

Data Validation

Custom Validation Rules: Define and run your own validation rules
SQL-Based Rules: Use SQL to define validation queries
Default Rules Generator: Automatically generate sensible validation rules based on your data
Detailed Results: Get comprehensive information about validation failures

Installation

# Basic installation
pip install sparvi-core

# With support for Snowflake
pip install sparvi-core[snowflake]

# With support for PostgreSQL
pip install sparvi-core[postgres]

# With all extras
pip install sparvi-core[snowflake,postgres]

Quick Start

Command Line Interface

Profile a table:

# Basic profiling
sparvi profile "duckdb:///path/to/database.duckdb" employees

# Save the profile to a file
sparvi profile "postgresql://user:pass@localhost/mydatabase" customers --output profile.json

# Compare with a previous profile
sparvi profile "snowflake://user:pass@account/database/schema?warehouse=wh" orders --compare previous_profile.json

Validate a table:

# Generate and run default validations
sparvi validate "duckdb:///path/to/database.duckdb" employees --generate-defaults

# Save the default rules to a YAML file
sparvi validate "duckdb:///path/to/database.duckdb" employees --generate-defaults --save-defaults rules.yaml

# Run validations from a file
sparvi validate "postgresql://user:pass@localhost/mydatabase" customers --rules rules.yaml

# Save validation results to a file
sparvi validate "snowflake://user:pass@account/database/schema?warehouse=wh" orders --rules rules.yaml --output results.json

Python API

Profile a table:

from sparvi.profiler.profile_engine import profile_table

# Run a profile
profile = profile_table("duckdb:///path/to/database.duckdb", "employees")

# Check completeness
for column, stats in profile["completeness"].items():
    print(f"{column}: {stats['null_percentage']}% null, {stats['distinct_percentage']}% distinct")

# Check for anomalies
for anomaly in profile.get("anomalies", []):
    print(f"Anomaly: {anomaly['description']}")

# Check for schema shifts
for shift in profile.get("schema_shifts", []):
    print(f"Schema shift: {shift['description']}")

Validate a table:

from sparvi.validations.validator import run_validations, load_rules_from_file
from sparvi.validations.default_validations import get_default_validations

# Generate default validation rules
rules = get_default_validations("duckdb:///path/to/database.duckdb", "employees")

# Run the validations
results = run_validations("duckdb:///path/to/database.duckdb", rules)

# Check results
for result in results:
    status = "PASS" if result["is_valid"] else "FAIL"
    print(f"{result['rule_name']}: {status}")
    if not result["is_valid"]:
        print(f"  Expected: {result['expected_value']}, Actual: {result['actual_value']}")

Multi-Database Support

Sparvi Core now has enhanced support for multiple database engines:

DuckDB: Included by default, ideal for local analysis
PostgreSQL: Install with pip install sparvi-core[postgres]
Snowflake: Install with pip install sparvi-core[snowflake]

The library uses database-specific adapters to ensure that SQL queries are optimized for each database engine. This provides consistent results while taking advantage of each database's specific features.

For example, Sparvi automatically adapts:

Regular expression syntax
Date/time functions
Percentile calculations
String operations

This means you can profile and validate your data using the same API regardless of the underlying database.

Database Compatibility

PostgreSQL Considerations

When working with PostgreSQL, keep in mind:

For date difference functions, we use PostgreSQL's DATE_PART function
Regex pattern matching uses PostgreSQL's ~ operator
When using the FILTER clause, ensure you have PostgreSQL 9.4 or higher

Snowflake Considerations

When working with Snowflake, keep in mind:

Regex pattern matching uses Snowflake's REGEXP_LIKE function
String functions may behave slightly differently than in PostgreSQL or DuckDB
To optimize performance with large Snowflake tables, consider using warehouse sizing options

Testing Your Setup

To verify your database connection and functionality, you can use:

from sparvi.db.adapters import get_adapter_for_connection

# Test connection with a simple query
engine = create_engine("your_connection_string")
adapter = get_adapter_for_connection(engine)
print(f"Connected to: {adapter.__class__.__name__}")

Contributing

Contributions are welcome! Please feel free to submit a Pull Request

License

Apache License 2.0

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.7.2

Oct 2, 2025

0.6.0

Aug 12, 2025

0.5.1

Mar 10, 2025

0.4.2

Mar 6, 2025

This version

0.4.1

Mar 6, 2025

0.4.0

Mar 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparvi_core-0.4.1.tar.gz (26.7 kB view details)

Uploaded Mar 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sparvi_core-0.4.1-py3-none-any.whl (24.3 kB view details)

Uploaded Mar 6, 2025 Python 3

File details

Details for the file sparvi_core-0.4.1.tar.gz.

File metadata

Download URL: sparvi_core-0.4.1.tar.gz
Upload date: Mar 6, 2025
Size: 26.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.8

File hashes

Hashes for sparvi_core-0.4.1.tar.gz
Algorithm	Hash digest
SHA256	`34e03d913400a49abcc16c4a37eb5bad3a31c76b4ac14aa0626225dd777ca85a`
MD5	`a1a14af60a581729baa3f3847a424756`
BLAKE2b-256	`f7020fa8ca17b19ee8bbcba9694ca621fe73913737b5129b77098d3386dee723`

See more details on using hashes here.

File details

Details for the file sparvi_core-0.4.1-py3-none-any.whl.

File metadata

Download URL: sparvi_core-0.4.1-py3-none-any.whl
Upload date: Mar 6, 2025
Size: 24.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.8

File hashes

Hashes for sparvi_core-0.4.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`909938b199c8b9b3762640b7f5e4499999027c054bf87ee630310c77d365ecc5`
MD5	`4e8941e2ec23012bee1de116e2710f58`
BLAKE2b-256	`9a4e3dc9f421778df2f2c81b299f643fde80f685730d051db37f8ee5d73c269d`

See more details on using hashes here.

sparvi-core 0.4.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Sparvi Core

Features

Data Profiling

Data Validation

Installation

Quick Start

Command Line Interface

Python API

Multi-Database Support

Database Compatibility

PostgreSQL Considerations

Snowflake Considerations

Testing Your Setup

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes