Skip to main content

veda_lib is a Python library designed to streamline the data preprocessing and cleaning workflow for machine learning projects. It offers a comprehensive set of tools to handle common data preparation tasks

Project description

veda_lib

A Python library designed to streamline the transition from raw data to machine learning models.
veda_lib automates and simplifies data preprocessing, cleaning, and balancing, addressing the time-consuming and complex aspects of these tasks to provide clean, ready-to-use data for your models.


Installation

First, install veda_lib using pip:

pip install veda_lib

How to use?

After installing veda_lib, import it into your project and start utilizing its modules to prepare your data. Below is a summary of the key functionalities provided by each module:

1. Preprocessor Module

  • Functions:
    • Removing null values
    • Handling duplicates
    • Imputing missing values with appropriate methods
  • Usage: Ideal for initial data cleaning and preprocessing steps.

2. OutlierHandler Module

  • Functions:
    • Handling outliers by either removing or capping them
    • Customizable based on the nature of your data
  • Usage: Useful for managing data skewness and ensuring robust model performance.

3. FeatureSelector Module

  • Functions:
    • Selecting important features from the dataset
    • Tailored selection based on the nature of the data
  • Usage: Helps in reducing dimensionality and focusing on the most impactful features.

4. DimensionReducer Module

  • Functions:
    • Reducing data dimensionality using appropriate techniques
  • Usage: Crucial for addressing the curse of dimensionality and improving model efficiency.

5. BalanceData Module

  • Functions:
    • Balancing class distribution in imbalanced datasets
    • Methods chosen based on data characteristics
  • Usage: Essential for improving model fairness and performance on imbalanced datasets.

6. Veda Module

  • Functions:
    • Integrates all the above functionalities into a single pipeline
  • Usage: Pass your raw data through this module to perform comprehensive EDA and get fully preprocessed, cleaned, and balanced data ready for model training.

Importing

  • Here is an example of importing Veda from veda_lib.Veda, here set classification to True if the problem is classification otherwise set to False.
from veda_lib import Veda
eda = Veda.Veda(classification=True)
eda.fit_transform(X, Y)
  • Here is an example of importing DataPreprocessor from veda_lib.Preprocessor, using default values of parameters
from veda_lib import Preprocessor
preprocessor = Preprocessor.DataPreprocessor()
X, y = preprocessor.fit_transform(X, Y)
  • Here is an example of importing OutlierPreprocessor from veda_lib.OutlierHandler, using default values of parameters.
from veda_lib import OutlierHandler
outlier_preprocessor = OutlierHandler.OutlierPreprocessor()
X, y = outlier_preprocessor.fit_transform(X, Y)
  • Here is an example of importing FeatureSelection from veda_lib.FeatureSelector, using default values of parameters.
from veda_lib import FeatureSelector
selector = FeatureSelector.FeatureSelection()
X, y = selector.fit_transform(X, y)
  • Here is an example of importing DimensionReducer from veda_lib.DimensionReducer, using default values of parameters.
from veda_lib import DimensionReducer
reducer = DimensionReducer.DimensionReducer()
X, y = reducer.fit_transform(X, y)
  • Here is an example of importing AdaptiveBalancer from veda_lib.BalanceData, using default values of parameters.
from veda_lib import BalanceData
balancer = BalanceData.AdaptiveBalancer(classification=True)
X, y, strategy, model = balancer.fit_transform(X, y)

Contributing

I welcome contributions to veda_lib! If you have a bug report, feature suggestion, or want to contribute code, please open an issue or pull request on GitHub.


License

veda_lib is licensed under the Apache License Version 2.0. See the LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

veda_lib-0.0.5.tar.gz (23.9 kB view details)

Uploaded Source

Built Distribution

veda_lib-0.0.5-py3-none-any.whl (27.0 kB view details)

Uploaded Python 3

File details

Details for the file veda_lib-0.0.5.tar.gz.

File metadata

  • Download URL: veda_lib-0.0.5.tar.gz
  • Upload date:
  • Size: 23.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.4

File hashes

Hashes for veda_lib-0.0.5.tar.gz
Algorithm Hash digest
SHA256 ece8d2d98352b0f5f1b71969a24db44423706b2eaa42b28ff091346a52224997
MD5 d8d19bcf98c16a4ffa42ad6736e17ac3
BLAKE2b-256 eb1f39f85ae5770fa5f7f4b86c7abc6545e06e5b50fed448d4b7ba7b59fbc4cb

See more details on using hashes here.

File details

Details for the file veda_lib-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: veda_lib-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 27.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.4

File hashes

Hashes for veda_lib-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 5c3aa49ac53603ccc890254d30541fc8caf2fa0199ee2d2b756178b0f28e793c
MD5 ffba9a528e4981aacf34a9434ff595fe
BLAKE2b-256 ed516922bb0cda58125e55946b9624f8db1a20522eb60f546cd53853bb0a16b5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page