The Ultimate Data Cleaning Engine for Python

These details have not been verified by PyPI

Project description

🌊 Tidely

The production-grade data cleaning engine for Python.

What is Tidely?

Tidely is a local-first, deterministic data cleaning library designed to replace hundreds of lines of fragile Pandas preprocessing code with a single, highly optimized command.

Tidely automatically profiles your dataset, infers semantic types (Dates, Emails, Currency, IDs), safely downcasts memory footprint by up to 85%, and structures unstructured text—all without silently mutating your business logic or randomly dropping values.

Why Tidely Exists

Data scientists and engineers spend 80% of their time writing repetitive data cleaning boilerplate: fixing M/D/YYYY dates, trimming whitespaces, downcasting 64-bit floats to save memory, parsing currency symbols, and dropping exact duplicate rows.

Tidely eliminates this boilerplate entirely. It is built on three core philosophies:

Never silently delete data. Every transformation is tracked, explained, and non-destructive.
Local-first and Secure. Tidely runs entirely on your CPU. No API keys, no LLMs, no cloud processing.
Deterministic. The same dirty DataFrame yields the exact same clean DataFrame, every single time.

⚡ Quick Start

Installation

pip install tidely

The One-Minute Example

import pandas as pd
import tidely as td

# 1. Load your dirty data
df = pd.read_csv("dirty_data.csv")

# 2. Clean it automatically
result = td.clean(df)

# 3. Retrieve the clean, memory-optimized DataFrame
clean_df = result.df

# 4. View a detailed, explainable summary of what changed
print(result.summary())

🔍 Before vs After

Before Tidely:

df = pd.read_csv("data.csv")
df.drop_duplicates(inplace=True)
df['date'] = pd.to_datetime(df['date'], errors='coerce')
df['price'] = df['price'].str.replace('$', '').astype(float)
df['category'] = df['category'].astype('category')
df['is_active'] = df['is_active'].map({'yes': True, 'no': False})
# ... 50 more lines of boilerplate ...

After Tidely:

import tidely as td
df = td.clean(pd.read_csv("data.csv")).df

🚀 Core Features

Semantic Intelligence: Natively infers and standardizes Emails, URLs, Currencies, Boolean permutations (yes/y/true/1), IPv4, SSNs, and Dates (including US formats like MM/DD/YYYY).
Memory Optimization: Automatically downcasts over-provisioned 64-bit integers/floats to 16/32-bit types, and converts low-cardinality strings to Categorical pointers. Safely reduces Pandas memory footprints by 40-85%.
Zero-Corruption Duplicate Removal: Identifies and drops exact duplicate rows that skew statistical modeling.
Deep Explainability: Generates an exhaustive summary() explaining what was changed, why it was changed, and the impact of the change.
Business Logic Protection: Explicitly issues Warnings for missing financial or identifier data rather than blindly imputing zeros.

Supported DataFrames

Tidely currently supports:

pandas.DataFrame
polars.DataFrame
polars.LazyFrame
pyarrow.Table

🏎️ Performance Philosophy

Tidely is designed for enterprise scale. It operates heavily via vectorized operations backed by pandas and polars.

During internal benchmarking, Tidely processed 10,000,000 rows across mixed-types in under 26 seconds, safely shrinking the DataFrame from 591 MB down to 85 MB without corrupting type definitions. We rely purely on algorithmic inference—no slow machine learning heuristics or network latency.

🛡️ Validation Summary (Public Beta)

Tidely v1.0 has completed an extensive internal validation campaign covering more than twenty real-world datasets across healthcare, finance, retail, manufacturing, government, environmental science, e-commerce, and enterprise Excel workflows.

The library has also passed property-based testing (Hypothesis), fuzz testing, large-scale stress testing up to 10 million rows, API stability checks, and cross-version compatibility testing.

Based on these results, Tidely is now entering Public Beta, where broader community feedback will continue to strengthen its reliability.

📚 Documentation

Detailed documentation is available in the docs/ directory:

🛣️ Roadmap

Multi-threaded processing for CSV batch-cleaning.
Out-of-core chunked processing for data exceeding local RAM.
Geographic coordinate standardization (Lat/Lon).
Enhanced HTML extraction capabilities.

🤝 Contributing

Tidely is an open-source project and community contributions are highly welcome. Please review our CONTRIBUTING.md and CODE_OF_CONDUCT.md before submitting pull requests.

License

Tidely is released under the MIT License.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.4.2

Jul 1, 2026

1.4.1

Jun 30, 2026

1.4.0

Jun 30, 2026

1.3.0b2 pre-release

Jun 30, 2026

1.3.0b1 pre-release

Jun 30, 2026

1.0.0b2 pre-release

Jun 29, 2026

This version

1.0.0b1 pre-release

Jun 29, 2026

0.3.0

Jun 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tidely-1.0.0b1.tar.gz (29.4 MB view details)

Uploaded Jun 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tidely-1.0.0b1-py3-none-any.whl (39.1 kB view details)

Uploaded Jun 29, 2026 Python 3

File details

Details for the file tidely-1.0.0b1.tar.gz.

File metadata

Download URL: tidely-1.0.0b1.tar.gz
Upload date: Jun 29, 2026
Size: 29.4 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for tidely-1.0.0b1.tar.gz
Algorithm	Hash digest
SHA256	`5265d7c96d11fe052146b30e8ecc26fbb3f522c23107925768a65be5a241b124`
MD5	`f48791b6bef08d19a50d28b72342cf10`
BLAKE2b-256	`14f4119f6142928f5ff5e43e9a58f8a4b8f1aed945984b9ae7a3dccd38d8892e`

See more details on using hashes here.

File details

Details for the file tidely-1.0.0b1-py3-none-any.whl.

File metadata

Download URL: tidely-1.0.0b1-py3-none-any.whl
Upload date: Jun 29, 2026
Size: 39.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for tidely-1.0.0b1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`457cd4f8dc8b6ae9ac7bb573a279c19306164029f20cbaf293c442bf93791960`
MD5	`b3c2e2c780963fd75049df8b3fd889b3`
BLAKE2b-256	`6cb8d72f48ecde3a0113e9c07a9fb39c5e7b248c5ad14642268d82f417e0714f`

See more details on using hashes here.

tidely 1.0.0b1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

🌊 Tidely

What is Tidely?

Why Tidely Exists

⚡ Quick Start

Installation

The One-Minute Example

🔍 Before vs After

🚀 Core Features

Supported DataFrames

🏎️ Performance Philosophy

🛡️ Validation Summary (Public Beta)

📚 Documentation

🛣️ Roadmap

🤝 Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes