Skip to main content

A comprehensive library for data preprocessing tasks

Project description

Data Preprocessing Library

A comprehensive library for data preprocessing tasks, including data cleaning, transformation, and visualization.

Installation

To install the library, use pip:

pip install data_preprocessing_library

Usage

Outlier Handling

import pandas as pd
from data_preprocessing_library.outlier_handler import OutlierHandler

data = pd.DataFrame({'A': [1, 2, 3, 4, 100]})
outlier_handler = OutlierHandler()

# Remove outliers using IQR method
cleaned_data = outlier_handler.iqr_outliers(data, 'A')
print(cleaned_data)

# Replace outliers with median
data = outlier_handler.replace_outliers_with_median(data, 'A')
print(data)

Scaling

import pandas as pd
from data_preprocessing_library.scaler import Scaler

data = pd.DataFrame({'A': [1, 2, 3, 4, 5]})
scaler = Scaler()

# Min-Max scaling
scaled_data = scaler.min_max_scale(data)
print(scaled_data)

# Standard scaling
standard_scaled_data = scaler.standard_scale(data)
print(standard_scaled_data)

Handling Missing Values

import pandas as pd
from data_preprocessing_library.missing_value_handler import MissingValueHandler

data = pd.DataFrame({'A': [1, 2, None, 4, 5]})
missing_value_handler = MissingValueHandler()

# Fill missing values with mean
data = missing_value_handler.fill_mean(data, ['A'])
print(data)

# Drop rows with missing values
data = missing_value_handler.drop_missing(data)
print(data)

Visualization

import pandas as pd
from data_preprocessing_library.visualizer import Visualizer

data = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [5, 4, 3, 2, 1]})
visualizer = Visualizer()

# Plot histogram
visualizer.plot_histogram(data, 'A')

# Plot boxplot
visualizer.plot_boxplot(data, 'A')

# Plot scatter plot
visualizer.plot_scatter(data, 'A', 'B')

# Plot correlation matrix
visualizer.plot_correlation_matrix(data)

Filtering Data

import pandas as pd
from data_preprocessing_library.data_filter import DataFilter

data = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [5, 4, 3, 2, 1]})
data_filter = DataFilter()

# Filter data by condition
filtered_data = data_filter.filter_by_condition(data, 'A > 2')
print(filtered_data)

# Filter specific columns
filtered_columns = data_filter.filter_by_columns(data, ['A'])
print(filtered_columns)

Encoding Categorical Data

import pandas as pd
from data_preprocessing_library.categorical_encoder import CategoricalEncoder

data = pd.DataFrame({'Category': ['A', 'B', 'A', 'C']})
encoder = CategoricalEncoder()

# One-hot encoding
one_hot_encoded_data = encoder.one_hot_encode(data, 'Category')
print(one_hot_encoded_data)

# Label encoding
label_encoded_data = encoder.label_encode(data, 'Category')
print(label_encoded_data)

Budget Categorization

import pandas as pd
from data_preprocessing_library.budget_handler import BudgetHandler

data = pd.DataFrame({'Budget': [500000, 20000000, 300000000]})
budget_handler = BudgetHandler()

# Categorize budget
categorized_data = budget_handler.categorize_budget(data, 'Budget')
print(categorized_data)

Data Type Conversion

import pandas as pd
from data_preprocessing_library.data_type_converter import DataTypeConverter

data = pd.DataFrame({'A': ['1', '2', '3'], 'B': [1, 2, 3]})
converter = DataTypeConverter()

# Convert to numeric
numeric_data = converter.convert_to_numeric(data, ['A'])
print(numeric_data)

# Convert to categorical
categorical_data = converter.convert_to_categorical(data, ['B'])
print(categorical_data)

Date and Time Handling

import pandas as pd
from data_preprocessing_library.date_time_handler import DateTimeHandler

data = pd.DataFrame({'Date': ['01/01/2020', '02/01/2020', '03/01/2020']})
date_time_handler = DateTimeHandler()

# Convert to datetime
datetime_data = date_time_handler.convert_to_datetime(data, 'Date')
print(datetime_data)

# Extract date parts
date_parts = date_time_handler.extract_date_parts(datetime_data, 'Date')
print(date_parts)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

programmingForData-0.1.0.tar.gz (3.2 kB view details)

Uploaded Source

Built Distribution

programmingForData-0.1.0-py3-none-any.whl (3.3 kB view details)

Uploaded Python 3

File details

Details for the file programmingForData-0.1.0.tar.gz.

File metadata

  • Download URL: programmingForData-0.1.0.tar.gz
  • Upload date:
  • Size: 3.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.13

File hashes

Hashes for programmingForData-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ea2d5622b4d1b9c0457cb63d28ec1075a1c97d6c078ff1498362e872c2f4d12f
MD5 015030b1b7b0f0b673ec6805f7e3d66b
BLAKE2b-256 422c8859a2f8e8f4ccc9aa7bc95abe3c2b7fa69cb6df3c8ec85b5e7503f93ac6

See more details on using hashes here.

File details

Details for the file programmingForData-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for programmingForData-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2de67fab7565df18a4f815a763a95132f166d486b694574fee1790401dc15030
MD5 ff10768109917335606b991ccd6456cb
BLAKE2b-256 0c27db26eb4eceb0e03ee21899886ae397bdf816409070c9739ee15c12ac999c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page