Skip to main content

A comprehensive data processing library.

Project description

Data Tools Package

A comprehensive library for data preprocessing in AI development, focusing on scalability, usability, and modular design.

Features

Features

  • Data Loading: Efficiently load datasets in various formats.
  • Data Cleaning: Handle missing values, outliers, and duplicates.
  • Feature Engineering: Create new features using advanced techniques.
  • Categorical Processing: One-hot and label encoding for categorical variables.
  • Scaling: Normalize and standardize numerical features.
  • Outlier Handling: Detect and remove outliers using IQR.
  • Text Processing: Clean, tokenize, and vectorize text data.
  • Time Series Processing: Create time-based features and resample data.
  • Image Processing: Load, resize, normalize, and convert images.
  • Image Augmentation: Apply transformations to increase the diversity of your training dataset.

usage

from dataprocessor import DataLoader, DataCleaner, FeatureEngineer, ImageProcessor, ImageAugmenter

# Example usage of the package
loader = DataLoader()
data = loader.load_csv("data.csv")

cleaner = DataCleaner()
cleaned_data = cleaner.clean(data)

# Image processing example
image = ImageProcessor.load_image("path/to/image.jpg")
resized_image = ImageProcessor.resize_image(image, (224, 224))
normalized_image = ImageProcessor.normalize_image(resized_image)

# Image augmentation example
augmented_image = ImageAugmenter.augment_image(normalized_image)

testing

poetry run pytest

TODO:

  • Fix file structure

Package

dataprocessor_vb pypi

  1. configure pypi credentials if not already done
poetry config pypi-token.pypi <your-api-token>
  1. publish the package
poetry publish --build
  1. make also sure you add token to secrets under your repo settings in github

I think that the version should be updated manually, because now it updates the patch every commit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataprocessor_vb-0.1.1.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataprocessor_vb-0.1.1-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file dataprocessor_vb-0.1.1.tar.gz.

File metadata

  • Download URL: dataprocessor_vb-0.1.1.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.12.8 Linux/6.5.0-1025-azure

File hashes

Hashes for dataprocessor_vb-0.1.1.tar.gz
Algorithm Hash digest
SHA256 71dd0a4153563127babbd5438a25470b0826a66673ec6861270a8212eab93da4
MD5 d55a694e12dc57f70a775e3960fce06b
BLAKE2b-256 d604df2534725b5491e62ce09ca4b1b8fc6a0a3ee3ec461e327db7093943ba8b

See more details on using hashes here.

File details

Details for the file dataprocessor_vb-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: dataprocessor_vb-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.12.8 Linux/6.5.0-1025-azure

File hashes

Hashes for dataprocessor_vb-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 12b311487a80c0f71547d53acb46be8eb86440beb4a6cb34ed3a527921e57cec
MD5 f8891061614444128f4a4bf75989e901
BLAKE2b-256 838e52e682e0d0c10ca971efea3503c80034c30cbcbad9bebc6d8011e265a86a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page