Skip to main content

Fast data cleaning engine for ML projects

Project description

🧼 datacleanx

datacleanx is a fast, CLI-first data cleaning engine for tabular datasets. It's designed for machine learning practitioners and data engineers who want to automate cleaning workflows efficiently using a single command-line interface.


🚀 Why datacleanx?

  • 🔁 Automates repetitive cleaning steps
  • 📦 Works out-of-the-box with CSV files
  • 📁 Outputs timestamped cleaned files and reports
  • 🐳 Docker-ready for CI/CD and containerized workflows
  • 🧪 Includes tests and reports for reproducibility

🔧 Features

  • ✅ Imputation: mean, median, mode
  • ✅ Encoding: label, onehot
  • ✅ Outlier removal using IQR
  • ✅ Feature scaling: standard, minmax, robust
  • ✅ Auto-saves cleaned data to outputs/
  • ✅ Saves reports as structured JSON
  • ✅ CLI-first design, easily scriptable
  • ✅ Docker and Poetry integration

📦 Installation

✅ Option 1: From PyPI

pip install datacleanx

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datacleanx-0.1.0.tar.gz (3.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datacleanx-0.1.0-py3-none-any.whl (6.2 kB view details)

Uploaded Python 3

File details

Details for the file datacleanx-0.1.0.tar.gz.

File metadata

  • Download URL: datacleanx-0.1.0.tar.gz
  • Upload date:
  • Size: 3.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/6.6.87.1-microsoft-standard-WSL2

File hashes

Hashes for datacleanx-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cede71689854b2524d1a338f0e61ae658e6ce901bd893728b9ad5e7226614870
MD5 8cb6d7dde7ea0e8383fb107baa938600
BLAKE2b-256 6f180a5c499eb0d4cfb43e7e480e755726cbd006cc37c13e8d1a6d7a3a907345

See more details on using hashes here.

File details

Details for the file datacleanx-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: datacleanx-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/6.6.87.1-microsoft-standard-WSL2

File hashes

Hashes for datacleanx-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 77a3d14bbec58f04210c373a4b083c5288b321d7eab703981633ee0797b32688
MD5 890040d4f89847d1558bff9a8b316b00
BLAKE2b-256 d82182da715674e64867c4179593b3b44d485d6467975983c4b78a09751e4fe0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page