preprocessy·PyPI

Data Preprocessing framework that provides customizable pipelines.

These details have not been verified by PyPI

Project links

Project description

preprocessy-logo

Preprocessy is a framework that provides data preprocessing pipelines for machine learning. It bundles all the common preprocessing steps that are performed on the data to prepare it for machine learning models. It aims to do so in a manner that is independent of the source and type of dataset. Hence, it provides a set of functions that have been generalised to different types of data.

The pipelines themselves are composed of these functions and flexible so that the users can customise them by adding their processing functions or removing pipeline functions according to their needs. The pipelines thus provide an abstract and high-level interface to the users.

Pipeline Structure

The pipelines are divided into 3 logical stages -

Stage 1 - Pipeline Input

Input datasets with the following extensions are supported - .csv, .tsv, .xls, .xlsx, .xlsm, .xlsb, .odf, .ods, .odt

Stage 2 - Processing

This is the major part of the pipeline consisting of processing functions. The following functions are provided out of the box as individual functions as well as a part of the pipelines -

Handling Null Values
Handling Outliers
Normalisation and Scaling
Label Encoding
Correlation and Feature Extraction
Training and Test set splitting

Stage 3 - Pipeline Output

The output consists of processed dataset and pipeline parameters depending on the verbosity required.

Contributing

Please read our Contributing Guide before submitting a Pull Request to the project.

Support

Feel free to contact any of the maintainers. We're happy to help!

Roadmap

Check out our roadmap to stay informed of the latest features released and the upcoming ones. Feel free to give us your insights!

Documentation

The documentation can be found at here. Currently, some parts of the documentation are under development. All contributions are welcome! Please see our Contributing Guide.

Research Paper and Citations

Preprocessy: A Customisable Data Preprocessing Framework with High-Level APIs was presented at the 2022 7th International Conference on Data Science and Machine Learning Applications (CDMA) and is published in IEEE Xplore.

Link to full paper: https://ieeexplore.ieee.org/document/9736366

If you're using Preprocessy as a part of scientific research, please use the below citations.

Plain Text Citation

S. Kazi et al., "Preprocessy: A Customisable Data Preprocessing Framework with High-Level APIs," 2022 7th International Conference on Data Science and Machine Learning Applications (CDMA), 2022, pp. 206-211, doi: 10.1109/CDMA54072.2022.00039.

BibTeX Citation

@INPROCEEDINGS{9736366,
  author={Kazi, Saif and Vakharia, Priyesh and Shah, Parth and Gupta, Riya and Tailor, Yash and Mantry, Palak and Rathod, Jash},
  booktitle={2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)},
  title={Preprocessy: A Customisable Data Preprocessing Framework with High-Level APIs},
  year={2022},
  volume={},
  number={},
  pages={206-211},
  doi={10.1109/CDMA54072.2022.00039}}

License

See the LICENSE file for licensing information.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.4

May 28, 2022

1.0.3

Jan 25, 2022

1.0.2

Jan 10, 2022

1.0.1

Oct 27, 2021

1.0.0

Sep 10, 2021

1.0.0rc1 pre-release

Aug 16, 2021

1.0.0a0 pre-release

Jul 17, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

preprocessy-1.0.4.tar.gz (26.1 kB view details)

Uploaded May 28, 2022 Source

Built Distribution

preprocessy-1.0.4-py3-none-any.whl (32.7 kB view details)

Uploaded May 28, 2022 Python 3

File details

Details for the file preprocessy-1.0.4.tar.gz.

File metadata

Download URL: preprocessy-1.0.4.tar.gz
Upload date: May 28, 2022
Size: 26.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.1.12 CPython/3.9.6 Darwin/21.5.0

File hashes

Hashes for preprocessy-1.0.4.tar.gz
Algorithm	Hash digest
SHA256	`d16097904ac6927b6bda6ccb9addddb03a0c964b651ac7ea9450949fdcd3f76b`
MD5	`3a42334ed7fa57031c7fba55c5d0fce4`
BLAKE2b-256	`23e1670fe8a196be87ba245d6e1892a04d48ba703c9974494d2b3609135e1c7a`

See more details on using hashes here.

File details

Details for the file preprocessy-1.0.4-py3-none-any.whl.

File metadata

Download URL: preprocessy-1.0.4-py3-none-any.whl
Upload date: May 28, 2022
Size: 32.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.1.12 CPython/3.9.6 Darwin/21.5.0

File hashes

Hashes for preprocessy-1.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ecc87ff935f5e7d1d0e90a09b9439fe0411644e96eba1cd02f80ec43a718e4a4`
MD5	`54bb3f2b6585f2e424522245cfc5ef6a`
BLAKE2b-256	`1a424597a425a2840a45399fc0c8bc82c5c0c0d625ad23bcf587d08061a99fb4`

See more details on using hashes here.

preprocessy 1.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Pipeline Structure

Stage 1 - Pipeline Input

Stage 2 - Processing

Stage 3 - Pipeline Output

Contributing

Support

Roadmap

Documentation

Research Paper and Citations

Plain Text Citation

BibTeX Citation

License

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes