ابزارهای قدرتمند برای تمیزکاری و پیش‌پردازش داده‌ها

These details have not been verified by PyPI

Project links

Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Information Analysis

Project description

markdown

Clean-Data

ابزارهای قدرتمند برای تمیزکاری و پیش‌پردازش داده‌ها در پایتون

🌐 English | فارسی

English Documentation

📁 Clean-Data

Clean-Data is a powerful library for data cleaning and preprocessing in Python. It simplifies repetitive tasks like handling missing values, removing duplicates, detecting outliers, and normalizing data.

✨ Key Features

Remove Duplicates: Eliminate duplicate records easily
Handle Missing Values: Fill with mean, median, mode, or custom values
Outlier Detection: Using IQR and Z-Score methods
Data Normalization: Min-Max, Standardization, and Robust Scaling
Auto Type Conversion: Convert columns to appropriate types
Quality Report: Get detailed statistics about your data

📦 Installation

pip install clean-data

🚀 Quick Start

import pandas as pd
from cleandata import DataCleaner, OutlierDetector, Normalizer, get_data_quality_report

Load data

df = pd.read_csv("data.csv")

Clean data

cleaner = DataCleaner(df)

cleaner.remove_duplicates()

cleaner.fill_missing("mean")

cleaner.strip_strings()

Detect and remove outliers

detector = OutlierDetector(cleaner.get_data())

outliers = detector.detect_iqr()

df_clean = detector.remove_outliers()

Normalize

normalizer = Normalizer(df_clean)

df_scaled = normalizer.min_max_scale()

Quality report

report = get_data_quality_report(df_clean)

print(report)

📚 API Reference

DataCleaner Class

Method	Description
remove_duplicates(subset, keep)	Remove duplicate rows
fill_missing(method, columns)	Fill missing values with mean, median, mode, or custom
remove_missing(threshold, axis)	Remove rows/columns with too many missing values
convert_types(columns)	Auto-convert column data types
strip_strings(columns)	Remove extra whitespace from strings
rename_columns(mapping)	Rename columns
filter_rows(condition)	Filter rows based on condition
reset()	Revert to original data

OutlierDetector Class

Method	Description
detect_iqr(columns, multiplier)	Detect outliers using IQR method
detect_zscore(columns, threshold)	Detect outliers using Z-Score method
remove_outliers(columns, method, threshold)	Remove rows with outliers
replace_outliers(columns, method, multiplier)	Replace outliers with mean/median/custom

Normalizer Class

Method	Description
min_max_scale(columns, feature_range)	Scale to a range (default 0-1)
standardize(columns)	Standardize to mean=0, std=1
robust_scale(columns)	Scale using median and IQR (robust to outliers)
log_transform(columns)	Apply log transformation

Utility Functions

Function	Description
get_data_quality_report(df)	Get comprehensive data quality report
get_column_info(df, column)	Get detailed info about a specific column

🛠️ Requirements

Python 3.7 or higher

pandas>=1.0.0

numpy>=1.18.0

scipy>=1.4.0

🤝 Contributing

We welcome contributions! Please:

1.Fork the repository

2.Create a new branch (git checkout -b feature/amazing-feature)

3.Commit your changes (git commit -m 'Add amazing feature')

4.Push to the branch (git push origin feature/amazing-feature)

5.Open a Pull Request

📄 License

This project is licensed under the MIT License.

📧 Contact

Email: hasan111bagher@gmail.com

GitHub: 0hasanbagheri0

فارسی

📁 Clean-Data

Clean-Data یک کتابخانه قدرتمند برای تمیزکاری و پیش‌پردازش داده‌ها در پایتون است. این کتابخانه کارهای تکراری مانند مدیریت مقادیر خالی، حذف رکوردهای تکراری، تشخیص داده‌های پرت و نرمال‌سازی داده‌ها را ساده می‌کند.

✨ ویژگی‌های کلیدی

حذف رکوردهای تکراری: حذف آسان رکوردهای تکراری

مدیریت مقادیر خالی: پر کردن با میانگین، میانه، مد یا مقدار دلخواه

تشخیص داده‌های پرت: با روش‌های IQR و Z-Score

نرمال‌سازی داده‌ها: Min-Max، Standardization و Robust Scaling

تبدیل خودکار نوع داده‌ها: تبدیل ستون‌ها به نوع مناسب

گزارش کیفیت: دریافت آمار دقیق از داده‌ها

📦 نصب

pip install clean-data

🚀 شروع سریع

import pandas as pd
from cleandata import DataCleaner, OutlierDetector, Normalizer, get_data_quality_report

بارگذاری داده

df = pd.read_csv("data.csv")

تمیزکاری

cleaner = DataCleaner(df)

cleaner.remove_duplicates()

cleaner.fill_missing("mean")

cleaner.strip_strings()

تشخیص و حذف داده‌های پرت

detector = OutlierDetector(cleaner.get_data())

outliers = detector.detect_iqr()

df_clean = detector.remove_outliers()

نرمال‌سازی

normalizer = Normalizer(df_clean)

df_scaled = normalizer.min_max_scale()

گزارش کیفیت

report = get_data_quality_report(df_clean)
print(report)

📚 راهنمای توابع

کلاس DataCleaner

تابع	توضیح
remove_duplicates(subset, keep)	حذف سطرهای تکراری
fill_missing(method, columns)	پر کردن مقادیر خالی با میانگین، میانه، مد یا مقدار دلخواه
remove_missing(threshold, axis)	حذف سطرها/ستون‌هایی که مقادیر خالی زیادی دارند
convert_types(columns)	تبدیل خودکار نوع ستون‌ها
strip_strings(columns)	حذف فاصله‌های اضافی از رشته‌ها
rename_columns(mapping)	تغییر نام ستون‌ها
filter_rows(condition)	فیلتر کردن سطرها بر اساس شرط
reset()	بازگشت به داده‌های اصلی

کلاس OutlierDetector

تابع	توضیح
detect_iqr(columns, multiplier)	تشخیص داده‌های پرت با روش IQR
detect_zscore(columns, threshold)	تشخیص داده‌های پرت با روش Z-Score
remove_outliers(columns, method, threshold)	حذف سطرهای حاوی داده‌های پرت
replace_outliers(columns, method, multiplier)	جایگزینی داده‌های پرت با میانگین/میانه/مقدار دلخواه

کلاس Normalizer

تابع	توضیح
min_max_scale(columns, feature_range)	مقیاس‌سازی به بازه مشخص (پیش‌فرض ۰ تا ۱)
standardize(columns)	استانداردسازی (میانگین صفر، انحراف معیار یک)
robust_scale(columns)	مقیاس‌سازی مقاوم به داده‌های پرت (با میانه و IQR)
log_transform(columns)	اعمال تبدیل لگاریتمی

توابع کمکی

تابع	توضیح
get_data_quality_report(df)	دریافت گزارش کامل کیفیت داده
get_column_info(df, column)	دریافت اطلاعات دقیق یک ستون خاص

🛠️ نیازمندی‌ها

Python 3.7 یا بالاتر

pandas>=1.0.0

numpy>=1.18.0

scipy>=1.4.0

🤝 مشارکت

از مشارکت شما استقبال می‌کنیم! لطفاً:

1.مخزن را Fork کنید

2.یک شاخه جدید بسازید (git checkout -b feature/amazing-feature)

3.تغییرات را Commit کنید (git commit -m 'Add amazing feature')

4.به شاخه خود Push کنید (git push origin feature/amazing-feature)

5.یک Pull Request باز کنید

📄 مجوز

این پروژه تحت مجوز MIT منتشر شده است.

📧 ارتباط با من

ایمیل: hasan111bagher@gmail.com

گیت‌هاب: 0hasanbagheri0

✨ اگر این کتابخانه برای شما مفید بود، به آن یک ⭐ در گیت‌هاب بدهید!

Project details

These details have not been verified by PyPI

Project links

Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Information Analysis

Release history Release notifications | RSS feed

1.0.0

Jun 22, 2026

0.1.1

Jun 22, 2026

This version

0.1.0

Jun 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clean_data_tools-0.1.0.tar.gz (9.0 kB view details)

Uploaded Jun 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

clean_data_tools-0.1.0-py3-none-any.whl (11.2 kB view details)

Uploaded Jun 22, 2026 Python 3

File details

Details for the file clean_data_tools-0.1.0.tar.gz.

File metadata

Download URL: clean_data_tools-0.1.0.tar.gz
Upload date: Jun 22, 2026
Size: 9.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for clean_data_tools-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`a8669b50d8e18313d501ced4baf13af95be23f6976475bb790278cd5633a134e`
MD5	`6e5a5d75e361d23aa5a44e1282d67402`
BLAKE2b-256	`ea61f1bd4b66a8dbf6b1476174d4e15a9163857e47259f58150ce2ce866bddbe`

See more details on using hashes here.

File details

Details for the file clean_data_tools-0.1.0-py3-none-any.whl.

File metadata

Download URL: clean_data_tools-0.1.0-py3-none-any.whl
Upload date: Jun 22, 2026
Size: 11.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for clean_data_tools-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a26087a2aba488a976848654a9d0f112ab0e2351c882b67ed20576b979335b92`
MD5	`05dfadc3d1800e788ef4561baea915b8`
BLAKE2b-256	`3b2f425597c8f57cff628a769cf181f3b0569f2fa1d228f8c345cd9cab7898f0`

See more details on using hashes here.

clean-data-tools 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Clean-Data

🌐 English | فارسی

English Documentation

📁 Clean-Data

Clean-Data is a powerful library for data cleaning and preprocessing in Python. It simplifies repetitive tasks like handling missing values, removing duplicates, detecting outliers, and normalizing data.

✨ Key Features

📦 Installation

🚀 Quick Start

Load data

Clean data

Detect and remove outliers

Normalize

Quality report

📚 API Reference

DataCleaner Class

OutlierDetector Class

Normalizer Class

Utility Functions

🛠️ Requirements

🤝 Contributing

📄 License

📧 Contact

📁 Clean-Data

✨ ویژگی‌های کلیدی

📦 نصب

🚀 شروع سریع

بارگذاری داده

تمیزکاری

تشخیص و حذف داده‌های پرت

نرمال‌سازی

گزارش کیفیت

📚 راهنمای توابع

کلاس DataCleaner

کلاس OutlierDetector

کلاس Normalizer

توابع کمکی

🛠️ نیازمندی‌ها

🤝 مشارکت

📄 مجوز

📧 ارتباط با من

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes