Skip to main content

This is a Python package that automates the data preprocessing

Project description

DataFit: Automated Data Preprocessing in Python

Note: This package is actively under development and is open source.

Overview

DataFit is a powerful Python package developed by Syed Syab and Hamza Rustam for automating data preprocessing tasks. Initiated as part of our Final Year Project at the University of Swat, this tool streamlines the data preprocessing pipeline, making it user-friendly for machine learning engineers and data scientists.

  • Project Initialization Date: 01/OCT/2023
  • Expected Project Finalization Date: 01/Dec/2023 (Initial Release) (Still under development)

Team Members

  1. Professor Naeem Ullah (Supervisor)

  2. Syed Syab (Student)

  3. Hamza Rustam (Student)

Package Functionality

The DataFit package is designed with a user-friendly interface, ensuring accessibility for all users. Its current functionality includes:

  • Displaying information about the dataset
  • Handling null values
  • Deleting multiple columns
  • Handling categorical values
  • Normalization
  • Standardization
  • Extracting numeric values
  • Tokenization

Usage

To use the package, install it using:

pip install datafit

Once installed, import it like Pandas and start using it:

import datafit.datafit as df

# Display information about the data
df.information(data)

To handle categorical values:

import datafit.datafit as df

# Specify columns to handle or use None for all columns
df.handleCategoricalValues(data, ["column1", "column2"])

To extract numerical values from columns:

import datafit.datafit as df

# Specify columns for extraction
df.extractValues(data, ["column1", "column2"])

New Updates in version=0.2023.2.13:

Description updated

Note: This package is actively under development. Feel free to share and follow on GitHub and LinkedIn for updates.

Your support is appreciated!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datafit-0.2023.2.13.tar.gz (7.5 kB view hashes)

Uploaded Source

Built Distribution

datafit-0.2023.2.13-py3-none-any.whl (7.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page