A Python library for exploratory data analysis with advanced statistical features
Project description
DataPrism
A Python library for exploratory data analysis with data profiling, quality assessment, and stability monitoring.
Interactive Viewer
DataPrism includes a built-in interactive dashboard to explore your analysis results in the browser.
from dataprism import DataPrism, DataLoader
# Load data from CSV or Parquet
df = DataLoader.load_csv("data.csv")
# df = DataLoader.load_parquet("data.parquet")
# Run analysis and launch viewer
prism = DataPrism()
prism.analyze(
data=df,
target_variable="target",
exclude_columns=["id", "split", "onboarding_date"],
output_path="eda_results.json",
)
prism.view()
Summary — Dataset overview, insights, top features by IV, data quality score, and provider match rates.
Catalog — Sortable feature table with type, provider, target correlation, IV, and PSI at a glance.
Deep Dive — Per-feature detail view with statistics, violin plots, distribution charts, PSI trend analysis, target associations, and correlations.
Associations — Mixed-method heatmap (Pearson, Theil's U, Eta) showing relationships across all features.
How DataPrism Compares
| Capability | DataPrism | ydata-profiling | Sweetviz | D-Tale | AutoViz | DataPrep |
|---|---|---|---|---|---|---|
| Predictive power (IV / WoE) | ✅ | ➖ | 🟡 | 🟡 | ➖ | ➖ |
| Drift detection (PSI) | ✅ | 🟡 | 🟡 | ➖ | ➖ | 🟡 |
| Data quality score | ✅ | ➖ | ➖ | ➖ | ➖ | ➖ |
| Multi-source match rates | ✅ | ➖ | ➖ | ➖ | ➖ | ➖ |
| Schema-aware profiling | ✅ | 🟡 | 🟡 | 🟡 | ➖ | 🟡 |
| Structured JSON output | ✅ | ✅ | ➖ | 🟡 | ➖ | 🟡 |
| Interactive explorer | ✅ | ✅ | 🟡 | ✅ | 🟡 | ✅ |
✅ Supported 🟡 Partial ➖ Not supported
Installation
pip install dataprism
Quick Start
from dataprism import DataPrism, DataLoader
df = DataLoader.load_csv("data.csv")
prism = DataPrism()
results = prism.analyze(
data=df,
exclude_columns=["customer_id", "created_at"],
target_variable="target",
output_path="eda_results.json"
)
For schema-aware profiling, stability analysis, and advanced configuration, see the Usage Guide.
Roadmap
DataPrism is being built for the AI era — where data analysis is increasingly driven by LLM agents, automated pipelines, and programmatic consumers rather than humans clicking through dashboards.
AI-Native Analysis
- Natural language insights — Auto-generated plain-English summaries of each feature, anomalies, and recommendations that LLMs can directly incorporate into reports.
Closing the Gaps
- Dataset comparison — Side-by-side train/test/production profiling with automatic drift highlights.
- Scatter & pair plots — Interactive scatter matrices for continuous feature pairs with target coloring.
- Auto-visualization — One-line generation of per-feature visual summaries exportable as images.
- Spark/Dask support — Distributed computation for datasets that don't fit in memory.
- Streaming analysis — Incremental profiling for real-time data pipelines without re-analyzing the full dataset.
Deeper Intelligence
- Automated feature recommendations — Go beyond flagging issues to suggesting transformations (log, binning, encoding) based on distribution shape and target relationship.
- Anomaly explanations — When outliers or drift are detected, surface the likely cause (data pipeline issues, population shift, seasonality).
- Cross-dataset lineage — Track how feature distributions evolve across model versions and data refreshes.
Documentation
- Usage Guide — schema, stability analysis, advanced configuration, provider match rates
- Architecture — internals, module structure, data flow
- Decision Records — key design decisions and rationale
- Examples — usage examples and demos
Development
pip install -e . # Install for development
python -m build # Build package
python -m pytest tests/ # Run tests
Requirements
- Python 3.9+
- pandas >= 2.0.0
- numpy >= 1.24.0
- scipy >= 1.10.0
- pyarrow >= 10.0.0 (for Parquet support)
License
MIT License - see LICENSE file for details.
Contact
For questions or suggestions:
- Email: dev@lattiq.com
- GitHub: https://github.com/lattiq/dataprism
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dataprism-0.1.6.tar.gz.
File metadata
- Download URL: dataprism-0.1.6.tar.gz
- Upload date:
- Size: 89.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
767a9c6de0288546b8d8f55f405957d7c5b4b0add3d563354a80dc78e5e0c276
|
|
| MD5 |
247bbca2fd4a7d924587cce8988bb665
|
|
| BLAKE2b-256 |
adcb902ce9512cb7a5d5e2d41eb0e0e163e259d085b2d9f882ee576f5cc2b478
|
Provenance
The following attestation bundles were made for dataprism-0.1.6.tar.gz:
Publisher:
pipeline.yaml on lattiq/dataprism
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dataprism-0.1.6.tar.gz -
Subject digest:
767a9c6de0288546b8d8f55f405957d7c5b4b0add3d563354a80dc78e5e0c276 - Sigstore transparency entry: 1203889283
- Sigstore integration time:
-
Permalink:
lattiq/dataprism@b0e81bc1844fcd5726cbdb6d96a9cb7e6277b2d2 -
Branch / Tag:
refs/tags/v0.1.6 - Owner: https://github.com/lattiq
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pipeline.yaml@b0e81bc1844fcd5726cbdb6d96a9cb7e6277b2d2 -
Trigger Event:
push
-
Statement type:
File details
Details for the file dataprism-0.1.6-py3-none-any.whl.
File metadata
- Download URL: dataprism-0.1.6-py3-none-any.whl
- Upload date:
- Size: 86.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6613b5c22c7c48a61fe0b033048342f9bd8dd595aa5841b13cac3941fa0a9c2
|
|
| MD5 |
79eb5166a97985937579dd2555ba308a
|
|
| BLAKE2b-256 |
baceb5638159a02ea12bc8ee98965908cb11fb99ae519a46cc9cf90586f066a2
|
Provenance
The following attestation bundles were made for dataprism-0.1.6-py3-none-any.whl:
Publisher:
pipeline.yaml on lattiq/dataprism
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dataprism-0.1.6-py3-none-any.whl -
Subject digest:
b6613b5c22c7c48a61fe0b033048342f9bd8dd595aa5841b13cac3941fa0a9c2 - Sigstore transparency entry: 1203889289
- Sigstore integration time:
-
Permalink:
lattiq/dataprism@b0e81bc1844fcd5726cbdb6d96a9cb7e6277b2d2 -
Branch / Tag:
refs/tags/v0.1.6 - Owner: https://github.com/lattiq
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pipeline.yaml@b0e81bc1844fcd5726cbdb6d96a9cb7e6277b2d2 -
Trigger Event:
push
-
Statement type: