Skip to main content

This is a Python package that automates the data preprocessing

Project description

DataFit: Automated Data Preprocessing in Python

Note: This package is actively under development and is open source.

Overview

DataFit is a powerful Python package developed by Syed Syab and Hamza Rustam for automating data preprocessing tasks. Initiated as part of our Final Year Project at the University of Swat, this tool streamlines the data preprocessing pipeline, making it user-friendly for machine learning engineers and data scientists.

  • Project Initialization Date: 01/OCT/2023
  • Expected Project Finalization Date: 01/Dec/2023 (Initial Release) (Still under development)

Team Members

  1. Professor Naeem Ullah (Supervisor)

  2. Syed Syab (Student)

  3. Hamza Rustam (Student)

Package Functionality

The DataFit package is designed with a user-friendly interface, ensuring accessibility for all users. Its current functionality includes:

  • Displaying information about the dataset
  • Handling null values
  • Deleting multiple columns
  • Handling categorical values
  • Normalization
  • Standardization
  • Extracting numeric values
  • Tokenization

Usage

To use the package, install it using:

pip install datafit

Once installed, import it like Pandas and start using it:

import datafit.datafit as df

# Display information about the data
df.information(data)

To handle categorical values:

import datafit.datafit as df

# Specify columns to handle or use None for all columns
df.handleCategoricalValues(data, ["column1", "column2"])

To extract numerical values from columns:

import datafit.datafit as df

# Specify columns for extraction
df.extractValues(data, ["column1", "column2"])

New Updates in version=0.2023.2.13:

Description updated

Note: This package is actively under development. Feel free to share and follow on GitHub and LinkedIn for updates.

Your support is appreciated!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datafit-0.2023.2.14.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

datafit-0.2023.2.14-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file datafit-0.2023.2.14.tar.gz.

File metadata

  • Download URL: datafit-0.2023.2.14.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.9

File hashes

Hashes for datafit-0.2023.2.14.tar.gz
Algorithm Hash digest
SHA256 ab53eaa81cde124a6eeb8cdca574b28e0088a369f0bb722ea4a2c1f15cf8426f
MD5 6b933ab2f02a48108561955262fe99c4
BLAKE2b-256 5d78ada7d0ed8744cb09a934007d632bcb49652fa6a046ae83d351668e61c71e

See more details on using hashes here.

File details

Details for the file datafit-0.2023.2.14-py3-none-any.whl.

File metadata

File hashes

Hashes for datafit-0.2023.2.14-py3-none-any.whl
Algorithm Hash digest
SHA256 89530776c6e60227d3b456296f20752eb396383b8cbccc8ba72d84c7fbe068c4
MD5 a5372aab6ab218c37e711a8a4c7ddecf
BLAKE2b-256 169dfbf898ba0c9e8cfcaa1f47b1faed570830b3ed46f4b0fd70a57c54c0a301

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page