A comprehensive library for data preprocessing tasks
Project description
Data Preprocessing Library
A comprehensive library for data preprocessing tasks, including data cleaning, transformation, and visualization.
Installation
To install the library, use pip:
pip install data_preprocessing_library
Usage
Outlier Handling
import pandas as pd
from data_preprocessing_library.outlier_handler import OutlierHandler
data = pd.DataFrame({'A': [1, 2, 3, 4, 100]})
outlier_handler = OutlierHandler()
# Remove outliers using IQR method
cleaned_data = outlier_handler.iqr_outliers(data, 'A')
print(cleaned_data)
# Replace outliers with median
data = outlier_handler.replace_outliers_with_median(data, 'A')
print(data)
Scaling
import pandas as pd
from data_preprocessing_library.scaler import Scaler
data = pd.DataFrame({'A': [1, 2, 3, 4, 5]})
scaler = Scaler()
# Min-Max scaling
scaled_data = scaler.min_max_scale(data)
print(scaled_data)
# Standard scaling
standard_scaled_data = scaler.standard_scale(data)
print(standard_scaled_data)
Handling Missing Values
import pandas as pd
from data_preprocessing_library.missing_value_handler import MissingValueHandler
data = pd.DataFrame({'A': [1, 2, None, 4, 5]})
missing_value_handler = MissingValueHandler()
# Fill missing values with mean
data = missing_value_handler.fill_mean(data, ['A'])
print(data)
# Drop rows with missing values
data = missing_value_handler.drop_missing(data)
print(data)
Visualization
import pandas as pd
from data_preprocessing_library.visualizer import Visualizer
data = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [5, 4, 3, 2, 1]})
visualizer = Visualizer()
# Plot histogram
visualizer.plot_histogram(data, 'A')
# Plot boxplot
visualizer.plot_boxplot(data, 'A')
# Plot scatter plot
visualizer.plot_scatter(data, 'A', 'B')
# Plot correlation matrix
visualizer.plot_correlation_matrix(data)
Filtering Data
import pandas as pd
from data_preprocessing_library.data_filter import DataFilter
data = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [5, 4, 3, 2, 1]})
data_filter = DataFilter()
# Filter data by condition
filtered_data = data_filter.filter_by_condition(data, 'A > 2')
print(filtered_data)
# Filter specific columns
filtered_columns = data_filter.filter_by_columns(data, ['A'])
print(filtered_columns)
Encoding Categorical Data
import pandas as pd
from data_preprocessing_library.categorical_encoder import CategoricalEncoder
data = pd.DataFrame({'Category': ['A', 'B', 'A', 'C']})
encoder = CategoricalEncoder()
# One-hot encoding
one_hot_encoded_data = encoder.one_hot_encode(data, 'Category')
print(one_hot_encoded_data)
# Label encoding
label_encoded_data = encoder.label_encode(data, 'Category')
print(label_encoded_data)
Budget Categorization
import pandas as pd
from data_preprocessing_library.budget_handler import BudgetHandler
data = pd.DataFrame({'Budget': [500000, 20000000, 300000000]})
budget_handler = BudgetHandler()
# Categorize budget
categorized_data = budget_handler.categorize_budget(data, 'Budget')
print(categorized_data)
Data Type Conversion
import pandas as pd
from data_preprocessing_library.data_type_converter import DataTypeConverter
data = pd.DataFrame({'A': ['1', '2', '3'], 'B': [1, 2, 3]})
converter = DataTypeConverter()
# Convert to numeric
numeric_data = converter.convert_to_numeric(data, ['A'])
print(numeric_data)
# Convert to categorical
categorical_data = converter.convert_to_categorical(data, ['B'])
print(categorical_data)
Date and Time Handling
import pandas as pd
from data_preprocessing_library.date_time_handler import DateTimeHandler
data = pd.DataFrame({'Date': ['01/01/2020', '02/01/2020', '03/01/2020']})
date_time_handler = DateTimeHandler()
# Convert to datetime
datetime_data = date_time_handler.convert_to_datetime(data, 'Date')
print(datetime_data)
# Extract date parts
date_parts = date_time_handler.extract_date_parts(datetime_data, 'Date')
print(date_parts)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file programmingForData-0.1.0.tar.gz
.
File metadata
- Download URL: programmingForData-0.1.0.tar.gz
- Upload date:
- Size: 3.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ea2d5622b4d1b9c0457cb63d28ec1075a1c97d6c078ff1498362e872c2f4d12f |
|
MD5 | 015030b1b7b0f0b673ec6805f7e3d66b |
|
BLAKE2b-256 | 422c8859a2f8e8f4ccc9aa7bc95abe3c2b7fa69cb6df3c8ec85b5e7503f93ac6 |
File details
Details for the file programmingForData-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: programmingForData-0.1.0-py3-none-any.whl
- Upload date:
- Size: 3.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2de67fab7565df18a4f815a763a95132f166d486b694574fee1790401dc15030 |
|
MD5 | ff10768109917335606b991ccd6456cb |
|
BLAKE2b-256 | 0c27db26eb4eceb0e03ee21899886ae397bdf816409070c9739ee15c12ac999c |