A comprehensive data processing library.
Project description
Data Tools Package
A comprehensive library for data preprocessing in AI development, focusing on scalability, usability, and modular design.
Features
Features
- Data Loading: Efficiently load datasets in various formats.
- Data Cleaning: Handle missing values, outliers, and duplicates.
- Feature Engineering: Create new features using advanced techniques.
- Categorical Processing: One-hot and label encoding for categorical variables.
- Scaling: Normalize and standardize numerical features.
- Outlier Handling: Detect and remove outliers using IQR.
- Text Processing: Clean, tokenize, and vectorize text data.
- Time Series Processing: Create time-based features and resample data.
- Image Processing: Load, resize, normalize, and convert images.
- Image Augmentation: Apply transformations to increase the diversity of your training dataset.
usage
from dataprocessor import DataLoader, DataCleaner, FeatureEngineer, ImageProcessor, ImageAugmenter
# Example usage of the package
loader = DataLoader()
data = loader.load_csv("data.csv")
cleaner = DataCleaner()
cleaned_data = cleaner.clean(data)
# Image processing example
image = ImageProcessor.load_image("path/to/image.jpg")
resized_image = ImageProcessor.resize_image(image, (224, 224))
normalized_image = ImageProcessor.normalize_image(resized_image)
# Image augmentation example
augmented_image = ImageAugmenter.augment_image(normalized_image)
testing
poetry run pytest
TODO:
- Fix file structure
Package
- configure pypi credentials if not already done
poetry config pypi-token.pypi <your-api-token>
- publish the package
poetry publish --build
- make also sure you add token to secrets under your repo settings in github
I think that the version should be updated manually, because now it updates the patch every commit.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dataprocessor_vb-0.1.1.tar.gz.
File metadata
- Download URL: dataprocessor_vb-0.1.1.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.5 CPython/3.12.8 Linux/6.5.0-1025-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
71dd0a4153563127babbd5438a25470b0826a66673ec6861270a8212eab93da4
|
|
| MD5 |
d55a694e12dc57f70a775e3960fce06b
|
|
| BLAKE2b-256 |
d604df2534725b5491e62ce09ca4b1b8fc6a0a3ee3ec461e327db7093943ba8b
|
File details
Details for the file dataprocessor_vb-0.1.1-py3-none-any.whl.
File metadata
- Download URL: dataprocessor_vb-0.1.1-py3-none-any.whl
- Upload date:
- Size: 8.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.5 CPython/3.12.8 Linux/6.5.0-1025-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
12b311487a80c0f71547d53acb46be8eb86440beb4a6cb34ed3a527921e57cec
|
|
| MD5 |
f8891061614444128f4a4bf75989e901
|
|
| BLAKE2b-256 |
838e52e682e0d0c10ca971efea3503c80034c30cbcbad9bebc6d8011e265a86a
|