Fast data cleaning engine for ML projects
Project description
🧼 datacleanx
datacleanx is a fast, CLI-first data cleaning engine for tabular datasets. It's designed for machine learning practitioners and data engineers who want to automate cleaning workflows efficiently using a single command-line interface.
🚀 Why datacleanx?
- 🔁 Automates repetitive cleaning steps
- 📦 Works out-of-the-box with CSV files
- 📁 Outputs timestamped cleaned files and reports
- 🐳 Docker-ready for CI/CD and containerized workflows
- 🧪 Includes tests and reports for reproducibility
🔧 Features
- ✅ Imputation:
mean,median,mode - ✅ Encoding:
label,onehot - ✅ Outlier removal using IQR
- ✅ Feature scaling:
standard,minmax,robust - ✅ Auto-saves cleaned data to
outputs/ - ✅ Saves reports as structured JSON
- ✅ CLI-first design, easily scriptable
- ✅ Docker and Poetry integration
📦 Installation
✅ Option 1: From PyPI
pip install datacleanx
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
datacleanx-0.1.0.tar.gz
(3.8 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datacleanx-0.1.0.tar.gz.
File metadata
- Download URL: datacleanx-0.1.0.tar.gz
- Upload date:
- Size: 3.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/6.6.87.1-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cede71689854b2524d1a338f0e61ae658e6ce901bd893728b9ad5e7226614870
|
|
| MD5 |
8cb6d7dde7ea0e8383fb107baa938600
|
|
| BLAKE2b-256 |
6f180a5c499eb0d4cfb43e7e480e755726cbd006cc37c13e8d1a6d7a3a907345
|
File details
Details for the file datacleanx-0.1.0-py3-none-any.whl.
File metadata
- Download URL: datacleanx-0.1.0-py3-none-any.whl
- Upload date:
- Size: 6.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/6.6.87.1-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
77a3d14bbec58f04210c373a4b083c5288b321d7eab703981633ee0797b32688
|
|
| MD5 |
890040d4f89847d1558bff9a8b316b00
|
|
| BLAKE2b-256 |
d82182da715674e64867c4179593b3b44d485d6467975983c4b78a09751e4fe0
|