Modern data profiling and drift detection framework
Project description
๐งฉ Baselinr
Baselinr is a modern, open-source data profiling and drift detection framework for SQL-based data warehouses. It automatically profiles datasets, stores metadata and statistics, and detects drift over time.
๐ Features
- Automated Profiling: Profile tables with column-level metrics (count, null %, distinct values, mean, stddev, histograms, etc.)
- Drift Detection: Compare profiling runs to detect schema and statistical drift with configurable strategies
- Type-Specific Thresholds: Adjust drift sensitivity based on column data type (numeric, categorical, timestamp, boolean) to reduce false positives
- Intelligent Baseline Selection: Automatically selects optimal baseline method (last run, moving average, prior period, stable window) based on column characteristics
- Advanced Statistical Tests: Kolmogorov-Smirnov (KS) test, Population Stability Index (PSI), Chi-square, Entropy, and more for rigorous drift detection
- Expectation Learning: Automatically learns expected metric ranges from historical profiling data, including control limits, distributions, and categorical frequencies for proactive anomaly detection
- Anomaly Detection: Automatically detects outliers and seasonal anomalies using learned expectations with multiple detection methods (IQR, MAD, EWMA, trend/seasonality, regime shift)
- Event & Alert Hooks: Pluggable event system for real-time alerts and notifications on drift, schema changes, anomalies, and profiling lifecycle events
- Partition-Aware Profiling: Intelligent partition handling with strategies for latest, recent_n, or sample partitions
- Adaptive Sampling: Multiple sampling methods (random, stratified, top-k) for efficient profiling of large datasets
- Multi-Database Support: Works with PostgreSQL, Snowflake, SQLite, MySQL, BigQuery, and Redshift
- Schema Versioning & Migrations: Built-in schema version management with migration system for safe database schema evolution
- Metadata Querying: Powerful CLI and API for querying profiling runs, drift events, and table history
- Dagster Integration: Built-in orchestration support with Dagster assets and schedules
- Configuration-Driven: Simple YAML/JSON configuration for defining profiling targets
- Historical Tracking: Store profiling results over time for trend analysis
- CLI Interface: Comprehensive command-line interface for profiling, drift detection, querying, schema management, and dashboard UI
๐ Requirements
- Python 3.10+
- One of the supported databases: PostgreSQL, Snowflake, SQLite, MySQL, BigQuery, or Redshift
๐ง Installation
Install from PyPI
Install Baselinr directly from PyPI:
pip install baselinr
Install with Optional Dependencies
Baselinr supports optional dependencies for enhanced functionality:
Snowflake Support:
pip install baselinr[snowflake]
Dagster Integration:
pip install baselinr[dagster]
All Features:
pip install baselinr[all]
Development Installation
For local development, clone the repository and install in editable mode:
git clone https://github.com/baselinrhq/baselinr.git
cd baselinr
pip install -e ".[dev]"
๐ Documentation
All documentation has been organized into the docs/ directory:
- Getting Started: docs/getting-started/ - Quick start and installation guides
- User Guides: docs/guides/ - Drift detection, partitioning, metrics
- Architecture: docs/architecture/ - System design and implementation
- Dashboard: docs/dashboard/ - Dashboard setup and development
- Development: docs/development/ - Contributing and development
- Roadmap: ROADMAP.md - Planned features and future enhancements
See docs/README.md for the complete documentation index.
๐ Quick Start
1. Create a Configuration File
Create a config.yml file:
environment: development
source:
type: postgres
host: localhost
port: 5432
database: mydb
username: user
password: password
schema: public
storage:
connection:
type: postgres
host: localhost
port: 5432
database: mydb
username: user
password: password
results_table: baselinr_results
runs_table: baselinr_runs
create_tables: true
enable_expectation_learning: true # Learn expected ranges automatically
learning_window_days: 30 # Use last 30 days of data
min_samples: 5 # Require at least 5 historical runs
enable_anomaly_detection: true # Detect anomalies using learned expectations
profiling:
tables:
# Explicit table selection (highest priority)
- table: customers
schema: public
# Pattern-based selection (wildcard)
- pattern: "user_*"
schema: public
# Matches: user_profile, user_settings, user_preferences, etc.
# Schema-based selection (all tables in schema)
- select_schema: true
schema: analytics
exclude_patterns:
- "*_temp"
- "*_backup"
# Regex pattern matching
- pattern: "^(customer|order)_\\d{4}$"
pattern_type: regex
schema: public
# Matches: customer_2024, order_2024, etc.
# Multi-database profiling (optional database field)
# - table: users
# schema: public
# database: analytics_db # Profile from analytics_db instead of source.database
# - pattern: "order_*"
# schema: public
# database: warehouse_db # Profile matching tables from warehouse_db
# - select_schema: true
# schema: analytics
# database: production_db # Profile all tables in analytics schema from production_db
# Discovery options for pattern-based selection
discovery_options:
max_tables_per_pattern: 1000
max_schemas_per_database: 100
cache_discovery: true
validate_regex: true
default_sample_ratio: 1.0
compute_histograms: true
histogram_bins: 10
2. Preview What Will Be Profiled
baselinr plan --config config.yml
This shows you what tables will be profiled without actually running the profiler.
3. Run Profiling
baselinr profile --config config.yml
4. Detect Drift
After running profiling multiple times:
baselinr drift --config config.yml --dataset customers
5. Query Profiling Metadata
Query your profiling history and drift events:
# List recent profiling runs
baselinr query runs --config config.yml --limit 10
# Query drift events
baselinr query drift --config config.yml --table customers --days 7
# Get detailed run information
baselinr query run --config config.yml --run-id <run-id>
# View table profiling history
baselinr query table --config config.yml --table customers --days 30
6. Check System Status
Get a quick overview of recent runs and active drift:
# Show status dashboard
baselinr status --config config.yml
# Show only drift summary
baselinr status --config config.yml --drift-only
# Watch mode (auto-refresh)
baselinr status --config config.yml --watch
# JSON output for scripting
baselinr status --config config.yml --json
7. Start Dashboard UI
Launch the web dashboard to view profiling runs, drift alerts, and metrics:
# Start dashboard (foreground mode)
baselinr ui --config config.yml
# Custom ports
baselinr ui --config config.yml --port-backend 8080 --port-frontend 3001
# Localhost only
baselinr ui --config config.yml --host 127.0.0.1
Press Ctrl+C to stop the dashboard. See docs/schemas/UI_COMMAND.md for more details.
8. Manage Schema Migrations
Check and apply schema migrations:
# Check schema version status
baselinr migrate status --config config.yml
# Apply migrations to latest version
baselinr migrate apply --config config.yml --target 1
# Validate schema integrity
baselinr migrate validate --config config.yml
๐ณ Docker Development Environment
Baselinr includes a complete Docker environment for local development and testing.
Start the Environment
cd docker
docker-compose up -d
This will start:
- PostgreSQL with sample data
- Dagster daemon for orchestration
- Dagster web UI at http://localhost:3000
Stop the Environment
cd docker
docker-compose down
๐ Profiling Metrics
Baselinr computes the following metrics:
All Column Types
- count: Total number of rows
- null_count: Number of null values
- null_ratio: Ratio of null values (0.0 to 1.0)
- distinct_count: Number of distinct values
- unique_ratio: Ratio of distinct values to total (0.0 to 1.0)
- approx_distinct_count: Approximate distinct count (database-specific)
- data_type_inferred: Inferred data type from values (email, url, date, etc.)
- column_stability_score: Column presence stability (0.0 to 1.0)
- column_age_days: Days since column first appeared
- type_consistency_score: Type consistency across runs (0.0 to 1.0)
Numeric Columns
- min: Minimum value
- max: Maximum value
- mean: Average value
- stddev: Standard deviation
- histogram: Distribution histogram (optional)
String Columns
- min: Lexicographic minimum
- max: Lexicographic maximum
- min_length: Minimum string length
- max_length: Maximum string length
- avg_length: Average string length
Table-Level Metrics
- row_count_change: Change in row count from previous run
- row_count_change_percent: Percentage change in row count
- row_count_stability_score: Row count stability (0.0 to 1.0)
- row_count_trend: Trend direction (increasing/stable/decreasing)
- schema_freshness: Timestamp of last schema modification
- schema_version: Incrementing schema version number
- column_count_change: Net change in column count
See docs/guides/PROFILING_ENRICHMENT.md for detailed documentation on enrichment features.
๐ง Expectation Learning
Baselinr can automatically learn expected metric ranges from historical profiling data, creating statistical models that help identify outliers without explicit thresholds.
Key Features
- Automatic Learning: Continuously learns expected values for metrics like mean, stddev, null_ratio, count, and unique_ratio
- Control Limits: Calculates lower and upper control limits using Shewhart (3-sigma) method or EWMA (Exponentially Weighted Moving Average)
- Distribution Detection: Automatically detects if metrics follow normal or empirical distributions
- Categorical Frequencies: Tracks expected frequency distributions for categorical columns
- Separate from Baselines: Learned expectations are stored separately from drift detection baselines, enabling proactive anomaly detection
How It Works
Expectation learning analyzes historical profiling data over a configurable window (default: 30 days) to compute:
- Expected mean, variance, and standard deviation
- Control limits for outlier detection (3-sigma or EWMA-based)
- Distribution parameters (normal vs empirical)
- Expected categorical value frequencies
These learned expectations are automatically updated after each profiling run, providing an evolving model of what "normal" looks like for your data.
Configuration
Enable expectation learning in your config.yml:
storage:
enable_expectation_learning: true
learning_window_days: 30 # Historical window in days
min_samples: 5 # Minimum runs required for learning
ewma_lambda: 0.2 # EWMA smoothing parameter (0 < lambda <= 1)
Use Cases
- Proactive Monitoring: Identify anomalies before they cause drift
- Automated Alerting: Flag unexpected metric values automatically
- Trend Analysis: Understand normal ranges for your data over time
- Quality Assurance: Ensure metrics stay within expected operational ranges
See docs/guides/EXPECTATION_LEARNING.md for comprehensive documentation on expectation learning.
๐ Dagster Integration
Baselinr can create Dagster assets dynamically from your configuration:
from baselinr.integrations.dagster import build_baselinr_definitions
defs = build_baselinr_definitions(
config_path="config.yml",
asset_prefix="baselinr",
job_name="baselinr_profile_all",
enable_sensor=True, # optional
)
๐ง dbt Integration
Baselinr provides comprehensive integration with dbt for scalable profiling and drift detection.
Using dbt Refs/Selectors in Configs
Reference dbt models directly in your baselinr configuration:
profiling:
tables:
- dbt_ref: customers
dbt_project_path: ./dbt_project
- dbt_selector: tag:critical
dbt_project_path: ./dbt_project
Direct dbt Model Integration
Add baselinr tests and profiling within dbt models:
# schema.yml
models:
- name: customers
config:
post-hook: "{{ baselinr_profile(target.schema, target.name) }}"
columns:
- name: customer_id
tests:
- baselinr_drift:
metric: count
threshold: 5.0
severity: high
Installation:
- Install baselinr:
pip install baselinr - Add to
packages.yml:packages: - git: "https://github.com/baselinrhq/baselinr.git" subdirectory: dbt_package
- Run:
dbt deps
See dbt Integration Guide for complete documentation.
๐ Python SDK
Baselinr provides a high-level Python SDK for programmatic access to all functionality.
Quick Start
from baselinr import BaselinrClient
# Initialize client
client = BaselinrClient(config_path="config.yml")
# Build execution plan
plan = client.plan()
print(f"Will profile {plan.total_tables} tables")
# Profile tables
results = client.profile()
for result in results:
print(f"Profiled {result.dataset_name}: {len(result.columns)} columns")
# Detect drift
drift_report = client.detect_drift("customers")
print(f"Found {len(drift_report.column_drifts)} column drifts")
# Query recent runs
runs = client.query_runs(days=7, limit=10)
# Get status summary
status = client.get_status()
print(f"Active drift events: {len(status['drift_summary'])}")
Documentation
- Complete SDK Guide: docs/guides/PYTHON_SDK.md - Comprehensive API reference, examples, and best practices
SDK Examples
- Basic Usage: examples/sdk_quickstart.py - Simple profiling and drift detection
- Advanced Usage: examples/sdk_advanced.py - Progress callbacks, custom analysis, querying
Key Features
- Simple API: All functionality through a single
BaselinrClientclass - Automatic Setup: Handles configuration loading, connection management, and event bus setup
- Type Hints: Full type annotations for IDE support
- Lazy Loading: Connections initialized only when needed
For complete SDK documentation including all methods, parameters, and advanced patterns, see the Python SDK Guide.
๐ฏ Use Cases
- Data Quality Monitoring: Track data quality metrics over time
- Schema Change Detection: Automatically detect schema changes
- Statistical Drift Detection: Identify statistical anomalies in your data
- Data Documentation: Generate up-to-date metadata about your datasets
- CI/CD Integration: Fail builds when critical drift is detected
๐ Project Structure
baselinr/
โโโ baselinr/ # Main package
โ โโโ config/ # Configuration management
โ โโโ connectors/ # Database connectors
โ โโโ profiling/ # Profiling engine
โ โโโ storage/ # Results storage
โ โโโ drift/ # Drift detection
โ โโโ learning/ # Expectation learning
โ โโโ anomaly/ # Anomaly detection
โ โโโ integrations/
โ โ โโโ dagster/ # Dagster assets & sensors
โ โโโ cli.py # CLI interface
โโโ examples/ # Example configurations
โ โโโ config.yml # PostgreSQL example
โ โโโ config_sqlite.yml # SQLite example
โ โโโ config_mysql.yml # MySQL example
โ โโโ config_bigquery.yml # BigQuery example
โ โโโ config_redshift.yml # Redshift example
โ โโโ config_with_metrics.yml # Metrics example
โ โโโ config_slack_alerts.yml # Slack alerts example
โ โโโ dagster_repository.py
โ โโโ quickstart.py
โโโ docker/ # Docker environment
โ โโโ docker-compose.yml
โ โโโ Dockerfile
โ โโโ init_postgres.sql
โ โโโ dagster.yaml
โ โโโ workspace.yaml
โโโ setup.py
โโโ requirements.txt
โโโ README.md
๐งช Running Examples
Quick Start Example
python examples/quickstart.py
CLI Examples
# View profiling plan (dry-run)
baselinr plan --config examples/config.yml
# View plan in JSON format
baselinr plan --config examples/config.yml --output json
# View plan with verbose details
baselinr plan --config examples/config.yml --verbose
# Profile all tables in config
baselinr profile --config examples/config.yml
# Profile with output to JSON
baselinr profile --config examples/config.yml --output results.json
# Dry run (don't write to storage)
baselinr profile --config examples/config.yml --dry-run
# Detect drift
baselinr drift --config examples/config.yml --dataset customers
# Detect drift with specific runs
baselinr drift --config examples/config.yml \
--dataset customers \
--baseline <run-id-1> \
--current <run-id-2>
# Fail on critical drift (useful for CI/CD)
baselinr drift --config examples/config.yml \
--dataset customers \
--fail-on-drift
# Use statistical tests for advanced drift detection
# (configure in config.yml: strategy: statistical)
# Query profiling runs
baselinr query runs --config examples/config.yml --limit 10
# Query drift events for a table
baselinr query drift --config examples/config.yml \
--table customers \
--severity high \
--days 7
# Get detailed run information
baselinr query run --config examples/config.yml \
--run-id <run-id> \
--format json
# View table profiling history
baselinr query table --config examples/config.yml \
--table customers \
--days 30 \
--format csv \
--output history.csv
# Check system status
baselinr status --config examples/config.yml
# Watch status (auto-refresh)
baselinr status --config examples/config.yml --watch
# Status with JSON output
baselinr status --config examples/config.yml --json
# Start dashboard UI
baselinr ui --config examples/config.yml
# Check schema migration status
baselinr migrate status --config examples/config.yml
# Apply schema migrations
baselinr migrate apply --config examples/config.yml --target 1
# Validate schema integrity
baselinr migrate validate --config examples/config.yml
๐ Drift Detection
Baselinr provides multiple drift detection strategies and intelligent baseline selection:
Available Strategies
-
Absolute Threshold (default): Simple percentage-based thresholds
- Low: 5% change
- Medium: 15% change
- High: 30% change
-
Standard Deviation: Statistical significance based on standard deviations
-
Statistical Tests (advanced): Multiple statistical tests for rigorous detection
- Numeric columns: KS test, PSI, Z-score
- Categorical columns: Chi-square, Entropy, Top-K stability
- Automatically selects appropriate tests based on column type
Intelligent Baseline Selection
Baselinr automatically selects the optimal baseline for drift detection based on column characteristics:
- Auto Selection: Automatically chooses the best baseline method per column
- High variance columns โ Moving average (smooths noise)
- Seasonal columns โ Prior period (accounts for weekly/monthly patterns)
- Stable columns โ Last run (simplest baseline)
- Moving Average: Average of last N runs (configurable, default: 7)
- Prior Period: Same period last week/month (handles seasonality)
- Stable Window: Historical window with low drift (most reliable)
- Last Run: Simple comparison to previous run (default)
Thresholds and baseline selection are fully configurable via the drift_detection configuration. See docs/guides/DRIFT_DETECTION.md for general drift detection and docs/guides/STATISTICAL_DRIFT_DETECTION.md for statistical tests.
๐ Event & Alert Hooks
Baselinr includes a pluggable event system that emits events for drift detection, schema changes, and profiling lifecycle events. You can register hooks to process these events for logging, persistence, or alerting.
Built-in Hooks
- LoggingAlertHook: Log events to stdout
- SQLEventHook: Persist events to any SQL database
- SnowflakeEventHook: Persist events to Snowflake with VARIANT support
Example Configuration
hooks:
enabled: true
hooks:
# Log all events
- type: logging
log_level: INFO
# Persist to database
- type: sql
table_name: baselinr_events
connection:
type: postgres
host: localhost
database: monitoring
username: user
password: pass
Event Types
- DataDriftDetected: Emitted when drift is detected
- SchemaChangeDetected: Emitted when schema changes
- ProfilingStarted: Emitted when profiling begins
- ProfilingCompleted: Emitted when profiling completes
- ProfilingFailed: Emitted when profiling fails
Custom Hooks
Create custom hooks by implementing the AlertHook protocol:
from baselinr.events import BaseEvent
class MyCustomHook:
def handle_event(self, event: BaseEvent) -> None:
# Process the event
print(f"Event: {event.event_type}")
Configure custom hooks:
hooks:
enabled: true
hooks:
- type: custom
module: my_hooks
class_name: MyCustomHook
params:
webhook_url: https://api.example.com/alerts
See docs/architecture/EVENTS_AND_HOOKS.md for comprehensive documentation and examples.
๐ Schema Versioning & Migrations
Baselinr includes a built-in schema versioning system to manage database schema evolution safely.
Migration Commands
# Check current schema version status
baselinr migrate status --config config.yml
# Apply migrations to a specific version
baselinr migrate apply --config config.yml --target 1
# Preview migrations (dry run)
baselinr migrate apply --config config.yml --target 1 --dry-run
# Validate schema integrity
baselinr migrate validate --config config.yml
How It Works
- Schema versions are tracked in the
baselinr_schema_versiontable - Migrations are applied incrementally and can be rolled back
- The system automatically detects when your database schema is out of date
- Migrations are idempotent and safe to run multiple times
๐ Metadata Querying
Baselinr provides powerful querying capabilities to explore your profiling history and drift events.
Query Commands
# Query profiling runs with filters
baselinr query runs --config config.yml \
--table customers \
--status completed \
--days 30 \
--limit 20 \
--format table
# Query drift events
baselinr query drift --config config.yml \
--table customers \
--severity high \
--days 7 \
--format json
# Get detailed information about a specific run
baselinr query run --config config.yml \
--run-id abc123-def456 \
--format json
# View table profiling history over time
baselinr query table --config config.yml \
--table customers \
--schema public \
--days 90 \
--format csv \
--output history.csv
Output Formats
All query commands support multiple output formats:
- table: Human-readable table format (default)
- json: JSON format for programmatic use
- csv: CSV format for spreadsheet analysis
๐ ๏ธ Configuration Options
Source Configuration
source:
type: postgres | snowflake | sqlite | mysql | bigquery | redshift
host: hostname
port: 5432
database: database_name
username: user
password: password
schema: schema_name # Optional
# Snowflake-specific
account: snowflake_account
warehouse: warehouse_name
role: role_name
# SQLite-specific
filepath: /path/to/database.db
# BigQuery-specific (credentials via extra_params)
extra_params:
credentials_path: /path/to/service-account-key.json
# Or use GOOGLE_APPLICATION_CREDENTIALS environment variable
# MySQL-specific
# Uses standard host/port/database/username/password
# Redshift-specific
# Uses standard host/port/database/username/password
# Default port: 5439
Profiling Configuration
profiling:
# Table discovery and pattern-based selection
table_discovery: true # Enable automatic table discovery
discovery_options:
max_tables_per_pattern: 1000 # Limit matches per pattern
max_schemas_per_database: 100 # Limit schemas to scan
validate_regex: true # Validate regex patterns at config load time
tag_provider: auto # Tag metadata provider: auto, snowflake, bigquery, postgres, mysql, redshift, sqlite, dbt
tables:
# Explicit table selection (highest priority)
- table: table_name
schema: schema_name # Optional
# Pattern-based selection (wildcard)
- pattern: "user_*"
schema: public
# Matches all tables starting with "user_"
# Regex pattern matching
- pattern: "^(customer|order)_\\d{4}$"
pattern_type: regex
schema: public
# Matches: customer_2024, order_2024, etc.
# Schema-based selection (all tables in schema)
- select_schema: true
schema: analytics
exclude_patterns:
- "*_temp"
- "*_backup"
# Database-level selection (all schemas)
- select_all_schemas: true
exclude_schemas:
- "information_schema"
- "pg_catalog"
# Multi-database profiling (optional database field)
# When database is specified, the pattern operates on that database
# When omitted, uses config.source.database (backward compatible)
# - table: customers
# schema: public
# database: analytics_db
# - select_all_schemas: true
# database: staging_db # Profile all schemas in staging_db
# Tag-based selection
- tags:
- "data_quality:critical"
- "domain:customer"
schema: public
# Precedence override (explicit table overrides pattern)
- pattern: "events_*"
schema: analytics
override_priority: 10
- table: events_critical
schema: analytics
override_priority: 100 # Higher priority overrides pattern
default_sample_ratio: 1.0
max_distinct_values: 1000
compute_histograms: true # Enable for statistical tests
histogram_bins: 10
metrics:
- count
- null_count
- null_ratio
- distinct_count
- unique_ratio
- approx_distinct_count
- min
- max
- mean
- stddev
- histogram
- data_type_inferred
Drift Detection Configuration
drift_detection:
# Strategy: absolute_threshold | standard_deviation | statistical
strategy: absolute_threshold
# Absolute threshold (default)
absolute_threshold:
low_threshold: 5.0
medium_threshold: 15.0
high_threshold: 30.0
# Baseline auto-selection configuration
baselines:
strategy: auto # auto | last_run | moving_average | prior_period | stable_window
windows:
moving_average: 7 # Number of runs for moving average
prior_period: 7 # Days for prior period (1=day, 7=week, 30=month)
min_runs: 3 # Minimum runs required for auto-selection
# Statistical tests (advanced)
# statistical:
# tests:
# - ks_test
# - psi
# - z_score
# - chi_square
# - entropy
# - top_k
# sensitivity: medium
# test_params:
# ks_test:
# alpha: 0.05
# psi:
# buckets: 10
# threshold: 0.2
Expectation Learning Configuration
storage:
# Enable automatic learning of expected metric ranges
enable_expectation_learning: true
# Historical window in days for learning expectations
learning_window_days: 30
# Minimum number of historical runs required for learning
min_samples: 5
# EWMA smoothing parameter for control limits (0 < lambda <= 1)
# Lower values = more smoothing (0.1-0.3 recommended)
ewma_lambda: 0.2
Anomaly Detection Configuration
storage:
# Enable automatic anomaly detection using learned expectations
enable_anomaly_detection: true
# List of enabled detection methods (default: all methods)
anomaly_enabled_methods:
- control_limits
- iqr
- mad
- ewma
- seasonality
- regime_shift
# IQR multiplier threshold for outlier detection
anomaly_iqr_threshold: 1.5
# MAD threshold (modified z-score) for outlier detection
anomaly_mad_threshold: 3.0
# EWMA deviation threshold (number of stddevs)
anomaly_ewma_deviation_threshold: 2.0
# Enable trend and seasonality detection
anomaly_seasonality_enabled: true
# Enable regime shift detection
anomaly_regime_shift_enabled: true
# Number of recent runs for regime shift comparison
anomaly_regime_shift_window: 3
# P-value threshold for regime shift detection
anomaly_regime_shift_sensitivity: 0.05
๐ Environment Variables
Baselinr supports environment variable overrides:
# Override source connection
export BASELINR_SOURCE__HOST=prod-db.example.com
export BASELINR_SOURCE__PASSWORD=secret
# Override environment
export BASELINR_ENVIRONMENT=production
# Run profiling
baselinr profile --config config.yml
๐งช Development
Run Tests
pytest
Code Formatting
black baselinr/
isort baselinr/
Type Checking
mypy baselinr/
๐ License
Apache License 2.0 with Commercial Distribution Restriction - see LICENSE file for details.
This software is available under a custom license based on Apache License 2.0. You may use this software freely, including for commercial and internal business purposes. However, you may not sell, lease, rent, or otherwise monetize this software or derivative works without explicit written permission from the copyright holders. For commercial distribution licensing inquiries, please contact hello@baselinr.io.
๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
๐ง Contact
For questions and support, please open an issue on GitHub.
Baselinr - Modern data profiling made simple ๐งฉ
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file baselinr-0.3.0.tar.gz.
File metadata
- Download URL: baselinr-0.3.0.tar.gz
- Upload date:
- Size: 615.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ebd67ac39f8ee3df24b4fb2441490213a5e1f52a2775625c79fc4c5eb88e3ab4
|
|
| MD5 |
47ea969c862a3d14d11034140d8ffd26
|
|
| BLAKE2b-256 |
d57812de0f915dfa75c796e0d2326855d207e68a3a2ec1ddc2b8021e3e5aea75
|
Provenance
The following attestation bundles were made for baselinr-0.3.0.tar.gz:
Publisher:
release.yml on baselinrhq/baselinr
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
baselinr-0.3.0.tar.gz -
Subject digest:
ebd67ac39f8ee3df24b4fb2441490213a5e1f52a2775625c79fc4c5eb88e3ab4 - Sigstore transparency entry: 719473811
- Sigstore integration time:
-
Permalink:
baselinrhq/baselinr@e8ce82e7770b9e58e2eb8733e5aef829ac06172f -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/baselinrhq
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e8ce82e7770b9e58e2eb8733e5aef829ac06172f -
Trigger Event:
push
-
Statement type:
File details
Details for the file baselinr-0.3.0-py3-none-any.whl.
File metadata
- Download URL: baselinr-0.3.0-py3-none-any.whl
- Upload date:
- Size: 209.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2575c9d7a10e70f174304e93fd671fb6a513189c9b459b7755e78c5d1f45e046
|
|
| MD5 |
d7905990bbc058d4e90b596d519b7172
|
|
| BLAKE2b-256 |
8ad838195532500569eb4b661f9efc62ef6c2b75c53742e8feb6d0e740dd48da
|
Provenance
The following attestation bundles were made for baselinr-0.3.0-py3-none-any.whl:
Publisher:
release.yml on baselinrhq/baselinr
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
baselinr-0.3.0-py3-none-any.whl -
Subject digest:
2575c9d7a10e70f174304e93fd671fb6a513189c9b459b7755e78c5d1f45e046 - Sigstore transparency entry: 719473814
- Sigstore integration time:
-
Permalink:
baselinrhq/baselinr@e8ce82e7770b9e58e2eb8733e5aef829ac06172f -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/baselinrhq
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e8ce82e7770b9e58e2eb8733e5aef829ac06172f -
Trigger Event:
push
-
Statement type: