Skip to main content

A beginner-friendly Python Package to clean, classify, and visualize CSV data via a simple GUI.

Project description

cleanclassify

cleanclassify is a beginner-friendly Python package that helps you clean, classify, and visualize CSV data — all from a sleek graphical interface. Whether you're a student, data science enthusiast, or someone exploring machine learning for the first time, this tool simplifies your journey.

What It Does

  • Cleans your dataset automatically

    • Handles missing values
    • Drops problematic or high-cardinality columns
    • Scales numeric features
    • Encodes categorical variables
  • Runs machine learning models

    • Trains and evaluates Logistic Regression, Random Forest, and Support Vector Classifier using scikit-learn
  • Visualizes model performance

    • Shows accuracy, precision, recall, and F1-score
    • Highlights the best-performing model
    • Plots a clean bar chart using matplotlib
  • Requires zero coding

    • Just load your CSV, pick the target column, and click the clean , classify buttons — that’s it!

Installation

Install it directly from PyPI:

pip install cleanclassify

This will automatically install required dependencies: pandas, numpy, scikit-learn, and matplotlib.


How to Use

Launch the GUI with:

python -m cleanclassify

Or if you're using the CLI script (after setup with console entry):

cleanclassify

💻 Example Workflow

  1. Launch the app.
  2. Browse and load your CSV file.
  3. Select the target column you want to predict.
  4. Click Run Cleaning to clean and prepare your dataset.
  5. Click Run Classification to train and evaluate models.
  6. View detailed metrics and a comparison chart of model performance.

What Your Data Should Look Like

  • Must contain a target column (the label you're predicting).
  • Can include both numeric and categorical features.
  • Should not include long text or extremely high-cardinality columns (they’ll be automatically dropped for performance).
  • If the dataset has more than 2000 rows, it will be automatically downsampled for memory efficiency.

Under the Hood

  • cleaner.py — Preprocesses data: cleans, encodes, scales, and downsamples.
  • classify.py — Trains and evaluates three ML models.
  • gui.py — A simple but powerful GUI built with tkinter.

👤 Author

Crafted with ❤️ by Safa Mahveen

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cleanclassify-0.1.tar.gz (6.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cleanclassify-0.1-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file cleanclassify-0.1.tar.gz.

File metadata

  • Download URL: cleanclassify-0.1.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for cleanclassify-0.1.tar.gz
Algorithm Hash digest
SHA256 048de24ebaf18ddd5aeb47fba2e96b91f431298a4010cb388d026e1fc863ff6f
MD5 db02bcf4041ab0dd13e4e88270e59454
BLAKE2b-256 cee9aa475588698f22c73ebce3e4b52fc6f6ba269c0cedb47a98f59458ddb4d6

See more details on using hashes here.

File details

Details for the file cleanclassify-0.1-py3-none-any.whl.

File metadata

  • Download URL: cleanclassify-0.1-py3-none-any.whl
  • Upload date:
  • Size: 8.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for cleanclassify-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6ea23bc3565ae0c0f4e49c492206f147817349597dd28587d2c3f38f6aec418b
MD5 ed4bb842b20d62906e5cb2bf088b61cc
BLAKE2b-256 2ab489e5140172967f6c3d9dffb648380178b749ad2483bfe614ae0a46ae7892

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page