The Ultimate Data Cleaning Engine for Python

These details have not been verified by PyPI

Project description

Tidely: The Operating System for Data Quality

Tidely is a production-grade Python package that acts as "The Operating System for Data Quality." Instead of introducing custom data wrappers, Tidely integrates seamlessly into existing pipelines by accepting and returning standard Pandas DataFrames, Polars DataFrames/LazyFrames, and PyArrow Tables.

Tidely relies on two primitives to drastically improve data workflows:

td.inspect(df): Generates a stunning Dataset Intelligence Report detailing Trust Scores, DNA signatures, and Semantics.
td.clean(df): Generates an explainable, deterministic cleaning plan to sanitize missing data, duplicate rows, memory bloat, and semantically noisy strings (Dates, Emails, Phones).

🚀 Key Features

Zero Friction API: Call td.inspect() or td.clean() on any Polars, Pandas, or PyArrow dataframe.
Lighthouse Dataset Trust Scores: Computes multi-dimensional quality scores (Overall, Reliability, ML Readiness, Memory Efficiency, Schema Stability, and Semantic Quality).
Deep Semantic Engine: Heuristic regexes and checksum algorithms (Luhn for Credit Cards, Verhoeff for Aadhaar) to validate PAN, GSTIN, IP addresses, emails, phone numbers, and currencies.
Explainable Cleaning: Automatically converts types, normalizes PII formats, imputes missing values, and drops exact duplicates—explaining exactly what changed, why it changed, and how much it bumped the Trust Score. By default, Tidely avoids forward-filling missing values (to prevent hallucinating metadata in cross-sectional data) and uses constant/mode imputation instead.
Streaming Native: Built on Polars, td.clean() natively supports collect(streaming=True) on massive out-of-core datasets.

📦 Installation

To install Tidely in your project, use pip or uv:

pip install tidely

uv add tidely

(Note: On Windows systems, Tidely automatically includes the tzdata package to support timezone-aware datetime validation).

⚡ Quick Start

1. Dataset Inspection

import tidely as sp
import polars as pl

# 1. Load your standard dataframe
df = pl.read_csv("messy_sales.csv")

# 2. Inspect the dataset
profile = td.inspect(df)

# Retrieve metrics programmatically
print(f"Overall Trust Score: {profile.trust_score.overall}/100")
print(f"ML Readiness: {profile.trust_score.ml_readiness}/100")

# 3. Display the stunning visual report in your terminal
profile.show()

2. Explainable Cleaning

import tidely as sp
import polars as pl

df = pl.read_csv("messy_sales.csv")

# Generate the plan, show it in the terminal, and execute it
clean_df = td.clean(df)

# Alternatively, step through it manually:
plan = td.plan(df)
plan.show()

# Dry run to see exactly what rows will be affected before mutating
plan.execute(dry_run=True)

# Execute
clean_df = plan.execute()

3. Command Line Interface (CLI)

Tidely exposes a Typer-based CLI for instant dataset diagnostics directly from your terminal:

# Get a stunning visual diagnostic report
tidely inspect --input messy_sales.csv

Tidely Demo

🛠️ Benchmarks

Tidely is brutally fast. Check out our benchmarking suite to see how we stack up against PyJanitor, Pandera, ydata-profiling, and Great Expectations.

100,000 Rows (19MB DataFrame)

Tool	Time (s)	Memory Peak (MB)
Tidely	1.02s	113.79
Pandera	1.18s	14.38
PyJanitor	N/A*	N/A*
Great Expectations	N/A*	N/A*
ydata-profiling	N/A*	N/A*

*Note: As of Pandas 2.x/3.x, pyjanitor and ydata-profiling have severe internal breaking changes that cause crashes. Great Expectations V1.0+ has completely removed its standard from_pandas API.

Despite producing a massively detailed heuristic semantic analysis AND executing data transformations, Tidely is still faster than pure schema-validation libraries like Pandera.

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details on how to set up your development environment, run tests, and submit pull requests.

📚 API Reference

`tidely.inspect(df: Any) -> DatasetProfile`

Generates a comprehensive diagnostic profile.

df: The input data (Pandas DataFrame, Polars DataFrame/LazyFrame, PyArrow Table).
Returns: A DatasetProfile object. Call .show() to render it in the terminal.

`tidely.plan(df: Any) -> RepairPlan`

Generates a deterministic cleaning plan without mutating the data.

df: The input data.
Returns: A RepairPlan object. Call .show() to view the plan, and .execute() to run the transformations.

`tidely.clean(df: Any) -> pl.DataFrame`

Automatically plans and executes all recommended data cleaning transformations.

df: The input data.
Returns: A pristine Polars DataFrame.

❓ FAQ

Q: Does Tidely overwrite my original data? No. Tidely always returns a new, sanitized DataFrame. It never mutates your data in place.

Q: Why does Tidely use Polars internally? Polars is written in Rust, utilizes lazy execution graphs, and is inherently multi-threaded. This allows Tidely to inspect and clean datasets magnitudes faster than native Pandas.

Q: Can I run this on huge datasets? Yes. You can pass a Polars LazyFrame to tidely.clean() and it will utilize streaming collect(streaming=True) if the queries fit out-of-core memory bounds.

Q: How does it know a column is a GSTIN or PAN? Tidely uses a deep semantic engine combining specialized regex heuristics and checksum algorithms (like Luhn and Verhoeff) to deterministically validate PII/Financial tokens.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.4.2

Jul 1, 2026

1.4.1

Jun 30, 2026

1.4.0

Jun 30, 2026

1.3.0b2 pre-release

Jun 30, 2026

1.3.0b1 pre-release

Jun 30, 2026

1.0.0b2 pre-release

Jun 29, 2026

1.0.0b1 pre-release

Jun 29, 2026

This version

0.3.0

Jun 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tidely-0.3.0.tar.gz (29.1 MB view details)

Uploaded Jun 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tidely-0.3.0-py3-none-any.whl (27.2 kB view details)

Uploaded Jun 29, 2026 Python 3

File details

Details for the file tidely-0.3.0.tar.gz.

File metadata

Download URL: tidely-0.3.0.tar.gz
Upload date: Jun 29, 2026
Size: 29.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for tidely-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`4b8e8a1b916981599e74e14f97a8e917f97064ec5ca264f072847e1e88b68df8`
MD5	`1fa965ac6bf78a96626078442d437c06`
BLAKE2b-256	`e9d73db5fa5d2c3983a96ba3be07f8a7a3b049b15f91a5df3c7cdaeb6e5461f7`

See more details on using hashes here.

File details

Details for the file tidely-0.3.0-py3-none-any.whl.

File metadata

Download URL: tidely-0.3.0-py3-none-any.whl
Upload date: Jun 29, 2026
Size: 27.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for tidely-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`36de060f3c42f2d2f5f7534e234014ec5f5046c364c7bba4495744ab633765f3`
MD5	`413f8c6bc644dc7dbeb9b2899afffce7`
BLAKE2b-256	`fe6327aa20ce796b88e549d762d9ed0a53873bb5f80e5d80fe3e0233d88831b7`

See more details on using hashes here.

tidely 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Tidely: The Operating System for Data Quality

🚀 Key Features

📦 Installation

⚡ Quick Start

1. Dataset Inspection

2. Explainable Cleaning

3. Command Line Interface (CLI)

🛠️ Benchmarks

100,000 Rows (19MB DataFrame)

🤝 Contributing

📚 API Reference

`tidely.inspect(df: Any) -> DatasetProfile`

`tidely.plan(df: Any) -> RepairPlan`

`tidely.clean(df: Any) -> pl.DataFrame`

❓ FAQ

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes