Skip to main content

This is a Python package that automates the data preprocessing

Project description

DataFit: Automated Data Preprocessing in Python

Note: This package is actively under development and is open source.

Overview

DataFit is a powerful Python package developed by Syed Syab and Hamza Rustam for automating data preprocessing tasks. Initiated as part of our Final Year Project at the University of Swat, this tool streamlines the data preprocessing pipeline, making it user-friendly for machine learning engineers and data scientists.

  • Project Initialization Date: 01/OCT/2023
  • Expected Project Finalization Date: 01/Dec/2023 (Initial Release) (Still under development)

Team Members

  1. Professor Naeem Ullah (Supervisor)

  2. Syed Syab (Student)

  3. Hamza Rustam (Student)

Package Functionality

The DataFit package is designed with a user-friendly interface, ensuring accessibility for all users. Its current functionality includes:

  • Displaying information about the dataset
  • Handling null values
  • Deleting multiple columns
  • Handling categorical values
  • Normalization
  • Standardization
  • Extracting numeric values
  • Tokenization

Usage

To use the package, install it using:

pip install datafit

Once installed, import it like Pandas and start using it:

import datafit.datafit as df

# Display information about the data
df.information(data)

To handle categorical values:

import datafit.datafit as df

# Specify columns to handle or use None for all columns
df.handleCategoricalValues(data, ["column1", "column2"])

To extract numerical values from columns:

import datafit.datafit as df

# Specify columns for extraction
df.extractValues(data, ["column1", "column2"])

New Updates in version=0.2023.2.13:

Description updated

Note: This package is actively under development. Feel free to share and follow on GitHub and LinkedIn for updates.

Your support is appreciated!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datafit-0.2023.3.0.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

datafit-0.2023.3.0-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file datafit-0.2023.3.0.tar.gz.

File metadata

  • Download URL: datafit-0.2023.3.0.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.9

File hashes

Hashes for datafit-0.2023.3.0.tar.gz
Algorithm Hash digest
SHA256 e5e897f925998154d055491cbe9849ef7b5f664eb201b5680d6f5e4295afdbf2
MD5 c8dfeef5c0a7615a066661093e6cb77d
BLAKE2b-256 f3ac64dc490a95ea116bed394d10eb290df55312e32697e27241bc74c645e912

See more details on using hashes here.

File details

Details for the file datafit-0.2023.3.0-py3-none-any.whl.

File metadata

  • Download URL: datafit-0.2023.3.0-py3-none-any.whl
  • Upload date:
  • Size: 10.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.9

File hashes

Hashes for datafit-0.2023.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 df6f3c9be8ac31f3afe2d202a780af0df3001ece63777489c071d6761d384f58
MD5 98cd0c9120eacc20c11f60655272b4fa
BLAKE2b-256 70c82cbe4b2988fa2cac3242b6d329eaec18a602924396f91bef61b9a6c90288

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page