A Python package for exploring and cleaning Pandas DataFrames

These details have not been verified by PyPI

Project description

PandasExplorer

Overview

The pandasdataexplorer.py file is a module within the PandasExplorer package. It provides a class PandasDataExplorer that encapsulates a variety of data preprocessing, exploration, and visualization utilities for Pandas DataFrames. These methods are designed to help users efficiently clean, transform, and analyze data using common tasks like renaming columns, handling missing values, and finding outliers, along with more advanced functionalities such as generating profile reports and plotting data distributions.

Methods

Methods

Column Operations

clean_columns():
- Cleans column names by making them lowercase and replacing spaces with underscores.
rename_columns(cols: list, new_names: list):
- Renames specified columns by their indices.
- Parameters:
  - cols: A list of column indices to rename.
  - new_names: A list of new column names.
remove_columns(col_indices):
- Removes columns from the DataFrame by their indices.
- Parameters:
  - col_indices: A list of column indices to remove.
change_column_dtype(col_number, type='int64'):
- Changes the data type of a specified column by its index.
- Parameters:
  - col_number: The index of the column.
  - type: The target data type (default is int64).
copy():
- Creates a copy of the DataFrame.
save_copy(filename: str):
- Saves the DataFrame copy to a CSV file.
- Parameters:
  - filename: The path to the CSV file where the DataFrame will be saved.

Data Cleaning

clean_string_columns():
- Trims and converts all string (object) columns to lowercase.
clean_float_columns():
- Rounds all float columns to two decimal places.
parse_date_columns():
- Attempts to convert string columns to datetime based on several common formats.
parse_int_columns():
- Attempts to convert string columns to integers or floats based on their contents.
drop_duplicate_rows():
- Removes duplicate rows, keeping only the first occurrence.

Data Exploration

show(rows=5):
- Displays the first n rows of the DataFrame.
- Parameters:
  - rows: Number of rows to display (default is 5).
get_info():
- Returns basic information about the DataFrame, including column types and non-null counts.
find_outliers(column_number):
- Finds outliers in the specified column using the IQR (Interquartile Range) method.
- Parameters:
  - column_number: The index of the column to check for outliers.

Outlier Handling

drop_outliers(column_number):
- Removes outliers in a specified column using the IQR method.
- Parameters:
  - column_number: The index of the column where outliers should be dropped.

Missing Values

find_missing_values(pct=False):
- Returns the count (or percentage) of missing values in each column.
- Parameters:
  - pct: If True, returns missing values as a percentage, otherwise returns as counts.
drop_missing_values(cols=None):
- Drops rows with missing values. Can drop rows with missing values only in specified columns.
- Parameters:
  - cols: A list of column indices. If None, rows with any missing values are dropped.

Grouping and Aggregation

groupby_categorical(groupby, col, func='sum', sort_descending=True):
- Groups the DataFrame by a specified column and applies an aggregation function to another column.
- Parameters:
  - groupby: Index of the column to group by.
  - col: Index of the column to aggregate.
  - func: Aggregation function (sum, min, max, count, avg).
  - sort_descending: Whether to sort the result in descending order (default is True).
count_distinct(groupby, col):
- Counts distinct values of a column within each group.
- Parameters:
  - groupby: Index of the column to group by.
  - col: Index of the column for which distinct values will be counted.

Visualization

show_numerical_distribution():
- Plots histograms for all numerical columns using Plotly.

Reports

generate_profile_report():
- Generates a profile report of the DataFrame using the pandas_profiling library and saves it as profile-report.html.

Project details

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.1.0

Oct 7, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandasdataexplorer-0.1.0.tar.gz (6.0 kB view details)

Uploaded Oct 7, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

PandasDataExplorer-0.1.0-py3-none-any.whl (6.2 kB view details)

Uploaded Oct 7, 2024 Python 3

File details

Details for the file pandasdataexplorer-0.1.0.tar.gz.

File metadata

Download URL: pandasdataexplorer-0.1.0.tar.gz
Upload date: Oct 7, 2024
Size: 6.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.11.0

File hashes

Hashes for pandasdataexplorer-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`84e19f0a8544b9b46c2a35755063dc161f59a247809853038a2ff7816350341c`
MD5	`7facc1b3221693be3b400f3b3197c914`
BLAKE2b-256	`f22a31a5d66a00795bd791b4acb589231dfc53b1455024c075890619343b11ee`

See more details on using hashes here.

File details

Details for the file PandasDataExplorer-0.1.0-py3-none-any.whl.

File metadata

Download URL: PandasDataExplorer-0.1.0-py3-none-any.whl
Upload date: Oct 7, 2024
Size: 6.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.11.0

File hashes

Hashes for PandasDataExplorer-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6d7292dfd73a5e20a0b976a0b73fdc4156beefe4852964b4769d749e78adaad7`
MD5	`041b6337bc2e0887c087e18b1c57ccc0`
BLAKE2b-256	`3990bdf2bd01a253ae81126503111dd1fe2d8f54d9f3f39357b7366a42a3cba9`

See more details on using hashes here.

PandasDataExplorer 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

PandasExplorer

Overview

Table of Contents

Methods

Column Operations

Data Cleaning

Data Exploration

Outlier Handling

Missing Values

Grouping and Aggregation

Visualization

Reports

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes