Skip to main content

A powerful, production-ready tabular data preprocessing and visualization library.

Project description

QPX Tabular

Python Version License: MIT Code Coverage Documentation

QPX Tabular is a powerful, production-ready tabular data preprocessing and visualization library designed to accelerate data science workflows. It turns raw, messy pandas DataFrames into machine-learning ready datasets with a single line of code.

Features

  • Automated Preprocessing (auto_preprocess): Automatically handles missing values, drops constants, drops high-cardinality nominals, encodes categoricals intelligently, and downcasts memory.
  • Fail-Loud Architecture: Built for production. Instead of failing silently, QPX immediately alerts you (KeyError, ValueError) if you provide invalid data configurations.
  • Comprehensive Data Health Diagnostics: Get 360-degree views of your dataset's health via dataset_health and statistical_snapshot.
  • Beautiful Visualizations: One-line correlation heatmaps, distribution plots, and hierarchical feature clustering matrices.

Installation

To install qpx, you can simply clone this repository and install it locally using pip:

git clone https://github.com/punitxdev/QPX.git
cd QPX
pip install -e .

Dependencies

  • pandas
  • numpy
  • matplotlib
  • seaborn
  • scipy

Quickstart

Clean an entire dataset with one function:

import pandas as pd
from qpx.tabular import preprocessing

# Load your raw data
df = pd.read_csv("my_messy_data.csv")

# Clean, encode, impute, and downcast in one go!
clean_df, report = preprocessing.auto_preprocess(
    df,
    max_onehot=10, 
    return_report=True
)

print(report)

Generate a deep-dive correlation map:

from qpx.tabular import visuals

visuals.corr_map(clean_df, target="my_target_column")

Documentation

The complete API reference and user guide is hosted online at: https://punitxdev.github.io/QPX/

If you want to build the documentation locally for development:

pip install -e .[dev]
mkdocs serve

To publish the documentation to GitHub Pages, simply run:

mkdocs gh-deploy

License

This project is licensed under the MIT License - see the LICENSE file for details.

Made with love by Punit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qpx_tabular-0.1.0.tar.gz (19.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qpx_tabular-0.1.0-py3-none-any.whl (18.9 kB view details)

Uploaded Python 3

File details

Details for the file qpx_tabular-0.1.0.tar.gz.

File metadata

  • Download URL: qpx_tabular-0.1.0.tar.gz
  • Upload date:
  • Size: 19.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for qpx_tabular-0.1.0.tar.gz
Algorithm Hash digest
SHA256 aa43324e92d56ff96591d74db74f81db9feadfe25e63c13910fd782f68513efa
MD5 6f27b11afb6c0dd3006743b1e6a7a9c8
BLAKE2b-256 eb5947cbfe2ef248eecb2206f4617d1afc6a744d789549502b659558802f523e

See more details on using hashes here.

File details

Details for the file qpx_tabular-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: qpx_tabular-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for qpx_tabular-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f6567defd49537d8742f07b143d8badcd171759d0b18f45597c8ef88a23457a5
MD5 2f51a3930078081d95bcb350e09c1857
BLAKE2b-256 cc742229ccf4818a247eb3c04b888fd1ca371886e73f2a52701844096e2496be

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page