No project description provided
Project description
A Deep Learning Data Analysis Package
DataPrep and Visualization Toolkit
This is a Python package designed to streamline the process of preparing datasets for machine learning workflows and visualizing time-series data. This package provides essential functionality for splitting datasets, applying data scaling techniques, and visualizing feature trends, making it easier to prepare data for modeling. This is version 0.3.0.1 of the package, and we plan to add more features in future updates!
Key Features
Exponential Weighted Mean Smoothing:
Smooths input features using an exponential weighted mean (EWM) to help reduce noise in the data before training.
Train-Test Split with Optional Validation Split:
The data_prep() function handles the splitting of data into training, testing, and (optionally) validation sets, with a variety of user-defined parameters for customization.
Scaling Options:
Choose between two widely-used scaling methods—MinMaxScaler and StandardScaler—to normalize your data and ensure that it’s well-prepared for machine learning models.
Support for Oversampling (SMOTE):
The package offers optional oversampling using the SMOTE technique to handle imbalanced datasets effectively.
Dataset Visualization:
The dataset_visualize() function allows you to easily visualize time-series data for selected features, providing insights into trends and patterns in the dataset.
Installation
You can install the package using pip:
pip install dl-data-analysis
Data Preparation
import pandas as pd
from your_package_name import data_prep
# Example usage
X_train, X_test, y_train, y_test = data_prep(
x_dataframe=my_data,
y_data=labels,
test_ratio=0.3,
validation=True,
scaler_type="min_max",
oversample=True
)
Visualization
from your_package_name import dataset_visualize
# Example visualization
dataset_visualize(
pd_dataframe=my_data,
feature_list=['feature_1', 'feature_2'],
Name='Sensor',
list=[1, 2, 3]
)
Planned Updates
This is just the first version of the package. We have plans to introduce additional features in the future, including:
- More scaling and normalization techniques.
- Advanced data preprocessing capabilities.
- Enhanced visualization functions.
- Support for more types of datasets and tasks.
Stay tuned for more!
Contributing
Contributions are welcome! If you have any ideas or would like to contribute to the project, please open an issue or submit a pull request.
License
This project is licensed under the MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dl_data_analysis-0.3.0.1.tar.gz
.
File metadata
- Download URL: dl_data_analysis-0.3.0.1.tar.gz
- Upload date:
- Size: 3.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 85ab2f422c2efa01b499ffbb1cad6ed308c852474b240560e43e49a5debbcb91 |
|
MD5 | d5b95cb545871abd0f8ba6195fa7e300 |
|
BLAKE2b-256 | 02b1c5514338ed348acbb0bf864e233b889a7c817c04ac3343e817843796b2bc |
File details
Details for the file dl_data_analysis-0.3.0.1-py3-none-any.whl
.
File metadata
- Download URL: dl_data_analysis-0.3.0.1-py3-none-any.whl
- Upload date:
- Size: 3.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae3fa78ab006904f77600b9bdc19cc067987c60f8305eeed9487ea1b6c2bb07c |
|
MD5 | 46ef9aa9d229e0ac0ece11db69058ebe |
|
BLAKE2b-256 | b8b6fe531890ea3e358f959d703f82c9693965e9633d4e30d8bb8c1444d8e2bb |