Skip to main content

Data Preprocessing library that provides customizable pipelines.

Project description

preprocessy-logo

Workflow Maintenance Issues Open Forks Stars GitHub contributors PRs welcome MIT license

Preprocessy is a library that provides data preprocessing pipelines for machine learning. It bundles all the common preprocessing steps that are performed on the data to prepare it for machine learning models. It aims to do so in a manner that is independent of the source and type of dataset. Hence, it provides a set of functions that have been generalised to different types of data.

The pipelines themselves are composed of these functions and flexible so that the users can customise them by adding their processing functions or removing pipeline functions according to their needs. The pipelines thus provide an abstract and high-level interface to the users.

Pipeline Structure

The pipelines are divided into 3 logical stages -

Stage 1 - Pipeline Input

Input datasets with the following extensions are supported - .csv, .tsv, .xls, .xlsx, .xlsm, .xlsb, .odf, .ods, .odt

Stage 2 - Processing

This is the major part of the pipeline consisting of processing functions. The following functions are provided out of the box as individual functions as well as a part of the pipelines -

  • Handling Null Values
  • Handling Outliers
  • Normalisation and Scaling
  • Label Encoding
  • Correlation and Feature Extraction
  • Training and Test set splitting

Stage 3 - Pipeline Output

The output consists of processed dataset and pipeline parameters depending on the verbosity required.

Contributing

Please read our Contributing Guide before submitting a Pull Request to the project.

Support

Feel free to contact any of the maintainers. We're happy to help!

Roadmap

Check out our roadmap to stay informed of the latest features released and the upcoming ones. Feel free to give us your insights!

Documentation

Currently, documentation is under development. All contributions are welcome! Please see our Contributing Guide.

License

See the LICENSE file for licensing information.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

preprocessy-1.0.3.tar.gz (24.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

preprocessy-1.0.3-py3-none-any.whl (31.8 kB view details)

Uploaded Python 3

File details

Details for the file preprocessy-1.0.3.tar.gz.

File metadata

  • Download URL: preprocessy-1.0.3.tar.gz
  • Upload date:
  • Size: 24.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.10.1 Linux/5.16.2-arch1-1

File hashes

Hashes for preprocessy-1.0.3.tar.gz
Algorithm Hash digest
SHA256 9a763c4c63bc669668f5528adb67fc9c12fddaf31e83e03a7c322995feafd7dc
MD5 568116057b71803714efe07dbbd9bcc8
BLAKE2b-256 55911e1b969fed6a8f5c0ddcf17043de10b9de81812fffd9386ec3ade206211d

See more details on using hashes here.

File details

Details for the file preprocessy-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: preprocessy-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 31.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.10.1 Linux/5.16.2-arch1-1

File hashes

Hashes for preprocessy-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 de69bc3a3e2f22b65959e16a52d6ec4e576a44ea4565f19ea2469a061aa6b5f1
MD5 491ccb34e92757b53a1d960b19928423
BLAKE2b-256 c26569e9e3c5cba3e50a623128f36cd24b92a5c16329dc44377a341511c50213

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page