A Python package for efficient data cleaning and preprocessing with Pandas.

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

pyspan

'pyspan' is a Python package designed to facilitate data cleaning and preprocessing using Pandas. It provides various functions to handle missing values, detect outliers, spell check data, and more. Additionally, it includes a logging utility to keep track of function calls and their parameters.

Installation

To use pyspan, simply install the package using pip:

``bash pip install pyspan

Functions

handle_nulls(data: pd.DataFrame, columns: List[str], method: str, value: Optional[Union[str, float]] = None) -> pd.DataFrame Handles missing values in the specified columns of a DataFrame. Parameters: data: DataFrame with missing values. columns: List of column names to apply the fill operation. method: Strategy to use for imputing missing values ('mean', 'median', 'mode', 'interpolate', 'forward_fill', 'backward_fill'). value: Custom value to fill NaNs with (optional).
remove_duplicates(df: pd.DataFrame) -> pd.DataFrame Removes duplicate rows from a DataFrame. Parameters: df: DataFrame to remove duplicates from.
remove_columns(df: pd.DataFrame, columns: List[str]) -> pd.DataFrame Removes specified columns from a DataFrame. Parameters: df: DataFrame to remove columns from. columns: List of column names to remove.
auto_rename_columns(df: pd.DataFrame) -> pd.DataFrame Automatically renames columns to remove spaces and special characters. Parameters: df: DataFrame to rename columns in.
rename_dataframe_columns(df: pd.DataFrame, rename_dict: dict) -> pd.DataFrame Renames columns in a DataFrame using a provided dictionary mapping. Parameters: df: DataFrame to rename columns in. rename_dict: Dictionary mapping current column names to new column names.
change_dt(df: pd.DataFrame, columns: Union[str, List[str]], date_format: str = "%Y-%m-%d", time_format: str = "%H:%M:%S", from_timezone: Optional[str] = None, to_timezone: Optional[str] = None) -> pd.DataFrame Changes the format of date and time columns and handles timezone conversion. Parameters: df: DataFrame containing date/time columns to reformat. columns: Name(s) of the column(s) to be reformatted. date_format: Desired date format. time_format: Desired time format. from_timezone: Original timezone of the datetime column(s). to_timezone: Desired timezone for the datetime column(s).
detect_delimiter(series: pd.Series) -> str Detects the most common delimiter in a Series of strings. Parameters: series: Series containing strings to analyze.
split_column(df: pd.DataFrame, column_name: str, delimiter: str = None) -> pd.DataFrame Splits a single column into multiple columns based on a delimiter. Parameters: df: DataFrame containing the column to split. column_name: Name of the column to be split. delimiter: Delimiter to use for splitting (optional).
impute(data, by=None, value=None, columns=None) -> pd.DataFrame Handles missing values using specified strategy or custom value. Parameters: data: DataFrame or Series with missing values. by: Strategy for imputing missing values ('mean', 'median', 'mode', 'interpolate', 'forward_fill', 'backward_fill'). value: Custom value to fill NaNs with. columns: List of column names to apply the fill operation.
spell_check_dataframe(data: pd.DataFrame, dictionary='en_US', columns=None) -> dict Performs spell check on specified columns of a DataFrame. Parameters: data: DataFrame containing columns to spell check. dictionary: Dictionary to use for spell checking ('en_US', 'en_GB', 'en_AU', 'en_IE'). columns: List of column names to perform spell check on.
detect_invalid_dates(series: pd.Series) -> pd.Series Detects invalid date values in a Series. Parameters: series: Series to check for invalid dates.
detect_data_entry_errors(data, spellcheck_dict='en_US', date_columns=None, numeric_columns=None, text_columns=None) -> pd.DataFrame Detects and flags data entry errors including invalid dates and misspelled words. Parameters: data: DataFrame to analyze. spellcheck_dict: Dictionary to use for spell checking. date_columns: List of columns to check for invalid dates. numeric_columns: List of columns to check for numeric format errors. text_columns: List of text columns to perform spell checking on.
data_type_conversions(data, column=None) -> pd.DataFrame or pd.Series Recommends and applies data type conversions based on the analysis of each column's data. Parameters: data: DataFrame or Series to analyze. column: Specific column to analyze (optional).
detect_outliers(data, method='iqr', threshold=1.5, columns=None, handle_missing=True) -> pd.DataFrame Detects outliers in a dataset using specified method and threshold. Parameters: data: DataFrame or Series to analyze. method: Outlier detection method ('z-score', 'iqr'). threshold: Threshold for outlier detection. columns: List of columns to apply the outlier detection on (optional). handle_missing: Whether to handle missing values by dropping them or not.

Logging Functions

display_logs() Prints stored log entries.

Example Usage

Here are some examples to illustrate the usage of the functions provided in pyspan:

import pandas as pd from pyspan import handle_nulls, remove_duplicates, remove_columns, auto_rename_columns, rename_dataframe_columns from pyspan import change_dt, detect_delimiter, split_column, impute, spell_check_dataframe from pyspan import detect_invalid_dates, detect_data_entry_errors, data_type_conversions, detect_outliers from pyspan import log_function_call, display_logs

Load a dataset

df = pd.read_csv('/content/GlobalSharkAttacks.csv')

Example usage of handle_nulls

df_filled = handle_nulls(df, columns=['Column1', 'Column2'], method='mean')

Example usage of remove_duplicates

df_unique = remove_duplicates(df)

Example usage of remove_columns

df_reduced = remove_columns(df, columns=['ColumnToRemove'])

Example usage of auto_rename_columns

auto_rename_columns(df)

Example usage of rename_dataframe_columns

rename_dict = {'OldName': 'NewName'} df_renamed_dict = rename_dataframe_columns(df, rename_dict)

Example usage of change_dt

df_formatted = change_dt(df, columns=['Date'], date_format='%d-%m-%Y', time_format='%I:%M %p')

Example usage of detect_delimiter

delimiter = detect_delimiter(df['ColumnWithDelimiters'])

Example usage of split_column

df_split = split_column(df, column_name='ColumnWithDelimiters')

Example usage of impute

df_imputed = impute(df, by='mean', columns=['Column1'])

Example usage of spell_check_dataframe

misspelled = spell_check_dataframe(df, dictionary='en_US', columns=['TextColumn'])

Example usage of detect_invalid_dates

invalid_dates = detect_invalid_dates(df['DateColumn'])

Example usage of detect_data_entry_errors

errors = detect_data_entry_errors(df, spellcheck_dict='en_US', date_columns=['DateColumn'], numeric_columns=['NumericColumn'])

Example usage of data_type_conversions

df_converted = data_type_conversions(df)

Example usage of detect_outliers

outliers = detect_outliers(df, method='iqr', threshold=1.5)

Example usage of display_logs

display_logs()

License

This package is licensed under the MIT License. See the LICENSE file for more details.

Contact

For issues or questions, please contact [amynahreimoo@gmail.com].

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

1.0.0

Apr 16, 2025

0.5.4

Apr 16, 2025

0.5.3

Apr 16, 2025

0.5.2

Apr 16, 2025

0.5.1

Mar 11, 2025

0.5.0

Mar 11, 2025

0.4.6

Jan 8, 2025

0.4.5

Nov 30, 2024

0.4.4

Nov 25, 2024

0.4.3

Nov 21, 2024

0.4.2

Nov 20, 2024

0.4.1

Nov 20, 2024

0.4.0

Nov 19, 2024

0.3.5

Nov 11, 2024

0.3.4

Nov 11, 2024

0.3.3

Nov 11, 2024

0.3.2

Nov 11, 2024

0.3.1

Nov 11, 2024

0.3.0

Nov 1, 2024

0.2.7

Oct 8, 2024

0.2.6

Oct 8, 2024

0.2.5

Oct 8, 2024

0.2.4

Oct 8, 2024

0.2.3

Oct 2, 2024

0.2.2

Sep 30, 2024

0.2.1

Sep 9, 2024

0.2.0

Sep 9, 2024

0.1.4

Aug 29, 2024

0.1.3

Aug 26, 2024

0.1.2

Aug 26, 2024

This version

0.1.1

Aug 20, 2024

0.1.0

Aug 19, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyspan-0.1.1.tar.gz (11.8 kB view details)

Uploaded Aug 20, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyspan-0.1.1-py3-none-any.whl (11.4 kB view details)

Uploaded Aug 20, 2024 Python 3

File details

Details for the file pyspan-0.1.1.tar.gz.

File metadata

Download URL: pyspan-0.1.1.tar.gz
Upload date: Aug 20, 2024
Size: 11.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for pyspan-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`3d7d8c0998649d317a3721c30ad3f3d3d910e72ce949449c93a3dbfd7b7ed680`
MD5	`938eded3e2ea43063cf6febb8a30fb09`
BLAKE2b-256	`6782138be2a54878e4cc8c7d10b2210a81597ac14fc6822a0d1e662e4e9f6974`

See more details on using hashes here.

File details

Details for the file pyspan-0.1.1-py3-none-any.whl.

File metadata

Download URL: pyspan-0.1.1-py3-none-any.whl
Upload date: Aug 20, 2024
Size: 11.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for pyspan-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fc62c0dce4d42da4630fc55c23353e964c2e63350f986d7433b02b993e540d60`
MD5	`6565d2686f1edbe422740ad4a81b73b1`
BLAKE2b-256	`2e3ff9929d5b6ffe2a93391c1e7a0f33519b550d4072282427b85661687a2a8a`

See more details on using hashes here.

pyspan 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pyspan

'pyspan' is a Python package designed to facilitate data cleaning and preprocessing using Pandas. It provides various functions to handle missing values, detect outliers, spell check data, and more. Additionally, it includes a logging utility to keep track of function calls and their parameters.

Functions

Example Usage

Load a dataset

Example usage of handle_nulls

Example usage of remove_duplicates

Example usage of remove_columns

Example usage of auto_rename_columns

Example usage of rename_dataframe_columns

Example usage of change_dt

Example usage of detect_delimiter

Example usage of split_column

Example usage of impute

Example usage of spell_check_dataframe

Example usage of detect_invalid_dates

Example usage of detect_data_entry_errors

Example usage of data_type_conversions

Example usage of detect_outliers

Example usage of display_logs

License

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes