A comprehensive data processing library.
Project description
Data Tools Package
A comprehensive library for data preprocessing in AI development, focusing on scalability, usability, and modular design.
Features
Features
- Data Loading: Efficiently load datasets in various formats.
- Data Cleaning: Handle missing values, outliers, and duplicates.
- Feature Engineering: Create new features using advanced techniques.
- Categorical Processing: One-hot and label encoding for categorical variables.
- Scaling: Normalize and standardize numerical features.
- Outlier Handling: Detect and remove outliers using IQR.
- Text Processing: Clean, tokenize, and vectorize text data.
- Time Series Processing: Create time-based features and resample data.
- Image Processing: Load, resize, normalize, and convert images.
- Image Augmentation: Apply transformations to increase the diversity of your training dataset.
usage
from dataprocessor import DataLoader, DataCleaner, FeatureEngineer, ImageProcessor, ImageAugmenter
# Example usage of the package
loader = DataLoader()
data = loader.load_csv("data.csv")
cleaner = DataCleaner()
cleaned_data = cleaner.clean(data)
# Image processing example
image = ImageProcessor.load_image("path/to/image.jpg")
resized_image = ImageProcessor.resize_image(image, (224, 224))
normalized_image = ImageProcessor.normalize_image(resized_image)
# Image augmentation example
augmented_image = ImageAugmenter.augment_image(normalized_image)
testing
poetry run pytest
TODO: restructure
package/
├── .github/
│ ├── workflows/
│ │ ├── ci.yml
│ │ ├── cd.yml
├── src/
│ └── dataprocessor/
│ ├── __init__.py
│ ├── loaders/ # Data loading modules
│ │ └── data_loader.py # Load various data formats (CSV, JSON, etc.)
│ ├── cleaners/ # Data cleaning modules
│ │ ├── data_cleaner.py # Clean and preprocess data
│ │ ├── outlier_handler.py # Outlier detection and handling
│ │ ├── scaling.py # Scaling/normalization techniques
│ │ └── categorical_processor.py # Handling categorical data
│ ├── transformers/ # Data transformation modules
│ │ ├── feature_engineer.py # Feature engineering tools
│ │ ├── text_processor.py # Text data processing (tokenization, cleaning)
│ │ ├── time_series_processor.py # Time series specific tools (windowing, etc.)
│ │ ├── image_processor.py # Image preprocessing (resizing, normalization)
│ │ └── image_augmenter.py # Data augmentation techniques for images
│ ├── evaluators/ # Evaluation modules
│ │ └── evaluator.py # Evaluation metrics and tools
│ ├── visualizers/ # Visualization modules
│ │ └── visualizer.py # Visualization tools (plots, charts)
│ ├── pipelines/ # Pipeline modules
│ │ ├── pipeline.py # Pipelines for chaining transformations
│ │ └── config.py # Configuration management for reproducibility
│ └── utils.py # Utility functions (logging, file handling)
├── tests/
│ ├── test_loaders/
│ │ └── test_data_loader.py
│ ├── test_cleaners/
│ │ ├── test_data_cleaner.py
│ │ ├── test_outlier_handler.py
│ │ └── test_scaling.py
│ │ └── test_categorical_processor.py
│ ├── test_transformers/
│ │ ├── test_feature_engineer.py
│ │ ├── test_text_processor.py
│ │ ├── test_time_series_processor.py
│ │ ├── test_image_processor.py
│ │ └── test_image_augmenter.py
│ ├── test_evaluators/
│ │ └── test_evaluator.py
│ ├── test_visualizers/
│ │ └── test_visualizer.py
│ ├── test_pipelines/
│ │ └── test_pipeline.py
│ └── test_audio_processor.py
│ └── test_tabular_processor.py
├── README.md
├── CONTRIBUTING.md # Guidelines for contributing to the package
├── CHANGELOG.md # Changelog for tracking updates and changes
├── examples/ # Directory for example notebooks or scripts
│ ├── example_data_loading.py
│ ├── example_feature_engineering.py
│ └── example_visualization.py
├── requirements.txt # List of dependencies for the package
└── pyproject.toml
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dataprocessor_vb-0.1.0.tar.gz.
File metadata
- Download URL: dataprocessor_vb-0.1.0.tar.gz
- Upload date:
- Size: 7.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.13.0 Darwin/23.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7c5bb9579b8f3c6fccc1afcb2e5499585966cbda703655eb4183b63b44699236
|
|
| MD5 |
58de2d0fa4179c243c6290cebc46de04
|
|
| BLAKE2b-256 |
ae1b6ead9b38328d11fffe33a8626939faeea8aab341e96699ebd890f65bfcc2
|
File details
Details for the file dataprocessor_vb-0.1.0-py3-none-any.whl.
File metadata
- Download URL: dataprocessor_vb-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.13.0 Darwin/23.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2aa8ceb1e581cc95b4c3aef867f218b213cbdea037dc47434275bca89183577e
|
|
| MD5 |
189a20e6e0c46f79570a959858b41e8a
|
|
| BLAKE2b-256 |
92b84940751f273e9c6f6415460d66cca4cb8e32b64790eaa76af2a0d2e227cd
|