This is a Python package that automates the data preprocessing
Project description
DataFit: Automated Data Preprocessing in Python
Note: This package is actively under development and is open source.
Overview
DataFit is a powerful Python package developed by Syed Syab and Hamza Rustam for automating data preprocessing tasks. Initiated as part of our Final Year Project at the University of Swat, this tool streamlines the data preprocessing pipeline, making it user-friendly for machine learning engineers and data scientists.
- Project Initialization Date: 01/OCT/2023
- Expected Project Finalization Date: 01/Dec/2023 (Initial Release) (Still under development)
Team Members
-
Professor Naeem Ullah (Supervisor)
- Email: naeem@uswat.edu.pk
-
Syed Syab (Student)
- GitHub
- Email: syab.se@hotmail.com
-
Hamza Rustam (Student)
- GitHub
- Email: hs4647213@gmail.com
Package Functionality
The DataFit package is designed with a user-friendly interface, ensuring accessibility for all users. Its current functionality includes:
- Displaying information about the dataset
- Handling null values
- Deleting multiple columns
- Handling categorical values
- Normalization
- Standardization
- Extracting numeric values
- Tokenization
Usage
To use the package, install it using:
pip install datafit
Once installed, import it like Pandas and start using it:
import datafit.datafit as df
# Display information about the data
df.information(data)
To handle categorical values:
import datafit.datafit as df
# Specify columns to handle or use None for all columns
df.handleCategoricalValues(data, ["column1", "column2"])
To extract numerical values from columns:
import datafit.datafit as df
# Specify columns for extraction
df.extractValues(data, ["column1", "column2"])
New Updates in version=0.2023.2.13:
Description updated
Note: This package is actively under development. Feel free to share and follow on GitHub and LinkedIn for updates.
Your support is appreciated!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file datafit-0.2023.3.0.tar.gz
.
File metadata
- Download URL: datafit-0.2023.3.0.tar.gz
- Upload date:
- Size: 12.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e5e897f925998154d055491cbe9849ef7b5f664eb201b5680d6f5e4295afdbf2 |
|
MD5 | c8dfeef5c0a7615a066661093e6cb77d |
|
BLAKE2b-256 | f3ac64dc490a95ea116bed394d10eb290df55312e32697e27241bc74c645e912 |
File details
Details for the file datafit-0.2023.3.0-py3-none-any.whl
.
File metadata
- Download URL: datafit-0.2023.3.0-py3-none-any.whl
- Upload date:
- Size: 10.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | df6f3c9be8ac31f3afe2d202a780af0df3001ece63777489c071d6761d384f58 |
|
MD5 | 98cd0c9120eacc20c11f60655272b4fa |
|
BLAKE2b-256 | 70c82cbe4b2988fa2cac3242b6d329eaec18a602924396f91bef61b9a6c90288 |