The Ultimate Data Cleaning Engine for Python
Project description
Tidely
The Operating System for Data Quality
Zero-Configuration • Explainable • Deterministic • Fast
Install
pip install tidely
The Magic
import tidely as td
result = td.clean("sales.csv")
clean_df = result.df
print(result.summary())
Why Tidely?
Real-world datasets are messy.
Missing values.
Broken dates.
Mixed datatypes.
Duplicate records.
Memory waste.
Encoding issues.
Schema drift.
Normally you spend hours writing cleaning scripts.
Tidely turns all of that into a single function call.
Dataset Intelligence
profile = td.inspect("sales.csv")
profile.show()
Output
✔ Trust Score
✔ Dataset DNA
✔ Semantic Detection
✔ Missing Values
✔ Duplicate Analysis
✔ Memory Analysis
✔ ML Readiness
✔ Data Quality Score
Why Use Tidely?
| Feature | Pandas | Tidely |
|---|---|---|
| Read CSV | ✅ | ✅ |
| Auto Detect Dates | ❌ | ✅ |
| Auto Clean Dataset | ❌ | ✅ |
| Memory Optimization | Manual | Automatic |
| Duplicate Detection | Manual | Automatic |
| Missing Value Strategy | Manual | Automatic |
| Semantic Column Detection | ❌ | ✅ |
| Explain Every Change | ❌ | ✅ |
| Health Score | ❌ | ✅ |
| Trust Score | ❌ | ✅ |
| Production Summary | ❌ | ✅ |
Production Validation
Tidely has been validated on
| Dataset Type | Status |
|---|---|
| CSV | ✅ |
| Excel (.xlsx) | ✅ |
| ARFF | ✅ |
| Government Open Data | ✅ |
| Educational Data | ✅ |
| ML Benchmark Datasets | ✅ |
| Large CSV (>3 Million Rows) | ✅ |
| Time Series | ✅ |
| Mixed Datatypes | ✅ |
| Corrupted Data | ✅ |
Validation Results
Version
v1.3.0b2
| Dataset | Rows | Health Before | Health After |
|---|---|---|---|
| Parking Meters | 52 | 94 | 96 |
| Credit-G | 1000 | 86 | 90 |
| Diabetes | 768 | 86 | 92 |
| Iris | 150 | 92 | 92 |
| Allegations | 57 | 95 | 92 |
| Mathematics | 59 | 97 | 94 |
Benchmarks
3,055,000 Row Dataset
| Metric | Result |
|---|---|
| Runtime | 2.37 sec |
| Original Memory | 148 MB |
| Final Memory | 58 MB |
| Memory Saved | 61% |
Supported Formats
-
CSV
-
Excel
-
Parquet
-
JSON
-
TSV
-
Feather
-
ARFF
More coming soon.
Explainable Cleaning
Tidely never silently changes your data.
Every transformation is documented.
Example
✓ Converted "Order Date" to datetime
Reason
Detected temporal values.
Impact
Allows time-series operations.
✓ Downcasted int64 → int16
Reason
Values fit inside Int16.
Impact
61% lower memory.
Philosophy
Tidely follows three principles.
Never silently modify data.
Every transformation is visible.
Deterministic.
Same input.
Same output.
Every time.
Local First.
Runs entirely on your machine.
No cloud.
No API keys.
No LLMs.
Roadmap
-
CSV Cleaning
-
Explainable Reports
-
Memory Optimization
-
Semantic Detection
-
ARFF Support
-
Excel Support
-
Intelligent Missing Value Imputation
-
Fuzzy Duplicate Detection
-
Streaming Engine
-
DuckDB Integration
-
Out-of-Core Cleaning
-
Auto Feature Engineering
-
SQL Dataset Support
-
Distributed Processing
Contributing
PRs are welcome.
Bug reports are welcome.
Feature requests are welcome.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tidely-1.3.0b2.tar.gz.
File metadata
- Download URL: tidely-1.3.0b2.tar.gz
- Upload date:
- Size: 2.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f750fcb764e657e7e4c3488e2f53756c4fb140f99fc85c9c83a2e58dab74e5c3
|
|
| MD5 |
a381d43937c9159a1919c87716fc9f5f
|
|
| BLAKE2b-256 |
019b3fc27b053b887f5a8b269cc24966171e84dfeca2fe6b3285d6a00c6e036e
|
File details
Details for the file tidely-1.3.0b2-py3-none-any.whl.
File metadata
- Download URL: tidely-1.3.0b2-py3-none-any.whl
- Upload date:
- Size: 45.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
55ca38287b501f2520bd80f319539a5450bcfe54b7e1b35159357e92fac7e2e0
|
|
| MD5 |
6241459ed58d7f8aeac018433d016304
|
|
| BLAKE2b-256 |
e2c30095227aefaa157121d22c8698433ff9621705dfea95a9ed4f2b837ebc73
|