Skip to main content

A library to provide quick and insightful data profiling for pandas DataFrames

Project description

DataProfilerKit

A Python library that provides quick and insightful data profiling for pandas DataFrames. It generates detailed reports including missing values analysis, data type information, correlations, outliers, and column statistics in a clear, organized format.

Installation

pip install data-profiler-kit

Usage

from dataprofilerkit import DataProfiler
import pandas as pd

# Create or load your DataFrame
df = pd.read_csv('your_data.csv')

# Create a DataProfiler instance
profiler = DataProfiler(df)

# Generate the profile
profile = profiler.generate_profile()

# Access different aspects of the profile
print("Basic Information:")
print(profile['basic_info'])

print("\nMissing Values Analysis:")
print(profile['missing_values'])

print("\nColumn Statistics:")
print(profile['column_stats'])

print("\nDuplicates Analysis:")
print(profile['duplicates'])

print("\nOutliers Analysis:")
print(profile['outliers'])

Core Functionality

  • Basic DataFrame Information:

    • Number of rows, columns, and total cells.
    • Memory usage of the DataFrame.
    • Data types and their counts.
  • Missing Value Analysis:

    • Total missing values across the DataFrame.
    • Missing values by column.
    • Percentage of missing values for each column.
  • Column-wise Analysis:

    • Numeric Columns:

      • Descriptive statistics (mean, median, standard deviation, etc.).
      • Skewness and kurtosis.
    • Categorical Columns:

      • Count of unique values.
      • Top 5 most frequent values with their percentages.
    • Datetime Columns:

      • Minimum and maximum values.
      • Range in days.
  • Duplicate Detection:

    • Duplicate rows (count and percentage).
    • Duplicate columns (count and list of column names).
  • Outlier Detection:

    • For numeric columns, detects outliers using:
      • Z-score method (with indices and percentages).
      • Interquartile Range (IQR) method (with indices and percentages).

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_profiler_kit-0.1.2.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_profiler_kit-0.1.2-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file data_profiler_kit-0.1.2.tar.gz.

File metadata

  • Download URL: data_profiler_kit-0.1.2.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.18

File hashes

Hashes for data_profiler_kit-0.1.2.tar.gz
Algorithm Hash digest
SHA256 03ff9258b9a558a9af2302addf38c07aeff4dd59369c4e02e4b73d831b952b66
MD5 f9fab249df9f880dfc29dc7314016fa8
BLAKE2b-256 d139016f825f54e3f518910c57d6830e9fac0c78317851e9a1a722e58605b9d3

See more details on using hashes here.

File details

Details for the file data_profiler_kit-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for data_profiler_kit-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 977010cbe6676683a7520446da639cc23305eb2022423e3c75e412345460073c
MD5 74914ea4e7d21febe85cf5f111737ac7
BLAKE2b-256 70759a58295a86187e53feb1e617b7519cad378b726619622b2eb0890d217eb1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page