A Streamlit-first Python package for detecting and visualizing data quality issues
Project description
LavenderTown
A Streamlit-first Python package for detecting and visualizing "data ghosts": type inconsistencies, nulls, invalid values, schema drift, and anomalies in tabular datasets.
LavenderTown helps you quickly identify data quality issues in your datasets through an intuitive, interactive Streamlit interface. Perfect for data scientists, analysts, and engineers who need to understand their data quality before diving into analysis.
โจ Key Features
- ๐ Zero-config data quality insights - Get started with minimal setup
- ๐ Streamlit-native UI - Fully integrated interactive dashboard
- ๐ผ Pandas & Polars support - Works with your existing data pipelines
- ๐ฏ Interactive detection - Drill down into problematic rows
- ๐ค Exportable findings - JSON, CSV, and Parquet formats
- ๐ Drift detection - Compare datasets for schema and distribution changes
- โ๏ธ Custom rules - Create and manage data quality rules via UI
- ๐ค ML-powered detection - 40+ anomaly detection algorithms
- ๐ Time-series analysis - Advanced time-series feature extraction
- ๐ High performance - Optimized for datasets up to millions of rows
New in v0.7.0: Modular UI components, Plotly interactive visualizations, tsfresh time-series features, Streamlit Extras UI, and SQLAlchemy database backend.
๐ฆ Installation
pip install lavendertown
For optional features (Polars, ML, time-series, Plotly, etc.), see the Installation Guide.
๐ Quick Start
import streamlit as st
from lavendertown import Inspector
import pandas as pd
# Load your data
df = pd.read_csv("your_data.csv")
# Create inspector and render
inspector = Inspector(df)
inspector.render() # Must be called within a Streamlit app context
Save this as app.py and run streamlit run app.py to see the interactive dashboard.
๐ Full Quick Start Guide โ
๐ Documentation
- Getting Started - Installation and setup
- User Guide - Comprehensive usage documentation
- API Reference - Complete API documentation
- Examples - Code examples and tutorials
- Version Mapping - Feature version history
๐ป Ghost Categories
LavenderTown detects four main categories of data quality issues:
- Structural Ghosts - Mixed dtypes, schema drift, unexpected nullability
- Value Ghosts - Out-of-range values, regex violations, enum violations
- Completeness Ghosts - Null density thresholds, conditional nulls
- Statistical Ghosts - Outliers (IQR method), distribution shifts
๐ Learn more about ghost detection โ
๐ก Usage Examples
Programmatic Usage
from lavendertown import Inspector
import pandas as pd
df = pd.read_csv("data.csv")
inspector = Inspector(df)
findings = inspector.detect()
for finding in findings:
print(f"{finding.column}: {finding.description}")
CLI Usage
# Analyze a CSV file
lavendertown analyze data.csv --output-format json
# Compare datasets for drift
lavendertown compare baseline.csv current.csv
๐ More examples โ
๐ ๏ธ Development
# Clone and install
git clone https://github.com/eddiethedean/lavendertown.git
cd lavendertown
pip install -e ".[dev]"
# Run tests
pytest tests/
# Code quality
ruff format . && ruff check . && mypy lavendertown
๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Links
- ๐ Documentation: https://lavendertown.readthedocs.io/en/latest/
- ๐ฆ PyPI Package: https://pypi.org/project/lavendertown/
- ๐ GitHub Repository: https://github.com/eddiethedean/lavendertown
- ๐ Issues: https://github.com/eddiethedean/lavendertown/issues
Made with โค๏ธ for the data quality community
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lavendertown-0.7.1.tar.gz.
File metadata
- Download URL: lavendertown-0.7.1.tar.gz
- Upload date:
- Size: 203.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
31fb071f6998917fa54dc9a3a970cda2bc22ed41c99814341f12a229b800200f
|
|
| MD5 |
be4215047b2f2809322e1beb496095b7
|
|
| BLAKE2b-256 |
64454ef08e0969f1fe09ef9ff141e1121be761cb165c8c84b079eae228b5ac7d
|
File details
Details for the file lavendertown-0.7.1-py3-none-any.whl.
File metadata
- Download URL: lavendertown-0.7.1-py3-none-any.whl
- Upload date:
- Size: 115.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
72f8cca829760b87eeb60dea883c37afb9c444e340f4c991098f838172691dd0
|
|
| MD5 |
8d6e714494b4df9ad437138ee4cf2787
|
|
| BLAKE2b-256 |
cf60572552e37358b5f5489da89f843ae8db14beeeb387f0f5555b85e8a79bb8
|