Skip to main content

A Streamlit-first Python package for detecting and visualizing data quality issues

Project description

LavenderTown

A Streamlit-first Python package for detecting and visualizing "data ghosts": type inconsistencies, nulls, invalid values, schema drift, and anomalies in tabular datasets.

Python 3.10+ PyPI version Documentation License: MIT

LavenderTown helps you quickly identify data quality issues in your datasets through an intuitive, interactive Streamlit interface. Perfect for data scientists, analysts, and engineers who need to understand their data quality before diving into analysis.

โœจ Key Features

  • ๐Ÿ” Zero-config data quality insights - Get started with minimal setup
  • ๐Ÿ“Š Streamlit-native UI - Fully integrated interactive dashboard
  • ๐Ÿผ Pandas & Polars support - Works with your existing data pipelines
  • ๐ŸŽฏ Interactive detection - Drill down into problematic rows
  • ๐Ÿ“ค Exportable findings - JSON, CSV, and Parquet formats
  • ๐Ÿ”„ Drift detection - Compare datasets for schema and distribution changes
  • โš™๏ธ Custom rules - Create and manage data quality rules via UI
  • ๐Ÿค– ML-powered detection - 40+ anomaly detection algorithms
  • ๐Ÿ“ˆ Time-series analysis - Advanced time-series feature extraction
  • ๐Ÿš€ High performance - Optimized for datasets up to millions of rows

New in v0.7.0: Modular UI components, Plotly interactive visualizations, tsfresh time-series features, Streamlit Extras UI, and SQLAlchemy database backend.

๐Ÿ‘‰ View all features โ†’

๐Ÿ“ฆ Installation

pip install lavendertown

For optional features (Polars, ML, time-series, Plotly, etc.), see the Installation Guide.

๐Ÿš€ Quick Start

import streamlit as st
from lavendertown import Inspector
import pandas as pd

# Load your data
df = pd.read_csv("your_data.csv")

# Create inspector and render
inspector = Inspector(df)
inspector.render()  # Must be called within a Streamlit app context

Save this as app.py and run streamlit run app.py to see the interactive dashboard.

๐Ÿ‘‰ Full Quick Start Guide โ†’

๐Ÿ“š Documentation

๐Ÿ‘ป Ghost Categories

LavenderTown detects four main categories of data quality issues:

  1. Structural Ghosts - Mixed dtypes, schema drift, unexpected nullability
  2. Value Ghosts - Out-of-range values, regex violations, enum violations
  3. Completeness Ghosts - Null density thresholds, conditional nulls
  4. Statistical Ghosts - Outliers (IQR method), distribution shifts

๐Ÿ‘‰ Learn more about ghost detection โ†’

๐Ÿ’ก Usage Examples

Programmatic Usage

from lavendertown import Inspector
import pandas as pd

df = pd.read_csv("data.csv")
inspector = Inspector(df)
findings = inspector.detect()

for finding in findings:
    print(f"{finding.column}: {finding.description}")

CLI Usage

# Analyze a CSV file
lavendertown analyze data.csv --output-format json

# Compare datasets for drift
lavendertown compare baseline.csv current.csv

๐Ÿ‘‰ More examples โ†’

๐Ÿ› ๏ธ Development

# Clone and install
git clone https://github.com/eddiethedean/lavendertown.git
cd lavendertown
pip install -e ".[dev]"

# Run tests
pytest tests/

# Code quality
ruff format . && ruff check . && mypy lavendertown

๐Ÿ‘‰ Development Guide โ†’

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ”— Links


Made with โค๏ธ for the data quality community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lavendertown-0.7.1.tar.gz (203.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lavendertown-0.7.1-py3-none-any.whl (115.1 kB view details)

Uploaded Python 3

File details

Details for the file lavendertown-0.7.1.tar.gz.

File metadata

  • Download URL: lavendertown-0.7.1.tar.gz
  • Upload date:
  • Size: 203.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for lavendertown-0.7.1.tar.gz
Algorithm Hash digest
SHA256 31fb071f6998917fa54dc9a3a970cda2bc22ed41c99814341f12a229b800200f
MD5 be4215047b2f2809322e1beb496095b7
BLAKE2b-256 64454ef08e0969f1fe09ef9ff141e1121be761cb165c8c84b079eae228b5ac7d

See more details on using hashes here.

File details

Details for the file lavendertown-0.7.1-py3-none-any.whl.

File metadata

  • Download URL: lavendertown-0.7.1-py3-none-any.whl
  • Upload date:
  • Size: 115.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for lavendertown-0.7.1-py3-none-any.whl
Algorithm Hash digest
SHA256 72f8cca829760b87eeb60dea883c37afb9c444e340f4c991098f838172691dd0
MD5 8d6e714494b4df9ad437138ee4cf2787
BLAKE2b-256 cf60572552e37358b5f5489da89f843ae8db14beeeb387f0f5555b85e8a79bb8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page