Skip to main content

Data Preprocessing model based on Keras preprocessing layers

Project description

🌟 Welcome to Keras Data Processor (KDP) - Preprocessing Power with TensorFlow Keras 🌟

Welcome to the Future of Data Preprocessing!

Diving into the world of machine learning and data science, we often find ourselves tangled in the preprocessing jungle. Worry no more! Introducing a state-of-the-art data preprocessing model based on TensorFlow Keras and the innovative use of Keras preprocessing layers.

Say goodbye to tedious data preparation tasks and hello to streamlined, efficient, and scalable data pipelines. Whether you're a seasoned data scientist or just starting out, this tool is designed to supercharge your ML workflows, making them more robust and faster than ever!

🔑 Key Features:

  • Automatic and scalable features statistics extraction: Automatically infer the feature tatistics from your data, saving you time and efforts.

  • Customizable Preprocessing Pipelines: Tailor your preprocessing steps with ease, choosing from a wide range of options for numeric, categorical, and even complex feature crosses.

  • Scalability and Efficiency: Designed for performance, handling large datasets with ease thanks to TensorFlow's powerful backend.

  • Easy Integration: Seamlessly fits into your TensorFlow Keras models (as first layers of the mode), making it a breeze to go from raw data to trained model faster than ever.

🚀 Getting started:

We use poetry for handling dependencies so you will need to install it first. Then you can install the dependencies by running:

To install dependencies:

poetry install

or to enter a dedicated env directly:

poetry shell

Then you can simply configure your preprocessor:

🛠️ Building Preprocessor:

from kdp import PreprocessingModel
from kdp import FeatureType

# DEFINING FEATURES PROCESSORS
features_specs = {
    # ======= NUMERICAL Features =========================
    "feat1": FeatureType.FLOAT_NORMALIZED,
    "feat2": FeatureType.FLOAT_RESCALED,
    # ======= CATEGORICAL Features ========================
    "feat3": FeatureType.STRING_CATEGORICAL,
    "feat4": FeatureType.INTEGER_CATEGORICAL,
    # ======= TEXT Features ========================
    "feat5": FeatureType.TEXT,
}

# INSTANTIATE THE PREPROCESSING MODEL with your data
ppr = PreprocessingModel(
    path_data="data/my_data.csv",
    features_specs=features_spec,
)
# construct the preprocessing pipelines
ppr.build_preprocessor()

This wil output:

{
'model': <Functional name=preprocessor, built=True>,
'inputs': {
    'feat1': <KerasTensor shape=(None, 1), dtype=float32, sparse=None, name=feat1>,
    'feat2': <KerasTensor shape=(None, 1), dtype=float32, sparse=None, name=feat2>,
    'feat3': <KerasTensor shape=(None, 1), dtype=string, sparse=None, name=feat3>,
    'feat4': <KerasTensor shape=(None, 1), dtype=int32, sparse=None, name=feat4>,
    'feat5': <KerasTensor shape=(None, 1), dtype=string, sparse=None, name=feat5>
    },
'signature': {
    'feat1': TensorSpec(shape=(None, 1), dtype=tf.float32, name='feat1'),
    'feat2': TensorSpec(shape=(None, 1), dtype=tf.float32, name='feat2'),
    'feat3': TensorSpec(shape=(None, 1), dtype=tf.string, name='feat3'),
    'feat4': TensorSpec(shape=(None, 1), dtype=tf.int32, name='feat4'),
    'feat5': TensorSpec(shape=(None, 1), dtype=tf.string, name='feat5')
    },
'output_dims': 45
}

This will result in the following preprocessing steps:

This preprocessing model can be used independentyly or as the first layer of any Keras model. This means you can ship your model with the preprocessing pipeline (built-in) as a single entity and deploy it with ease using Tesnorflow Serving.

## 🔍 Dive Deeper:

Explore the detailed documentation to leverage the full potential of this preprocessing tool. Learn about customizing feature crosses, bucketization strategies, embedding sizes, and much more to truly tailor your preprocessing pipeline to your project's needs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kdp-1.7.0.tar.gz (122.2 kB view details)

Uploaded Source

Built Distribution

kdp-1.7.0-py3-none-any.whl (120.8 kB view details)

Uploaded Python 3

File details

Details for the file kdp-1.7.0.tar.gz.

File metadata

  • Download URL: kdp-1.7.0.tar.gz
  • Upload date:
  • Size: 122.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.9 Linux/6.5.0-1022-azure

File hashes

Hashes for kdp-1.7.0.tar.gz
Algorithm Hash digest
SHA256 740bcc141dbfdd051feb2074a2f9f9b9f7c4c5662eda7d5c459076c8cfd6bc19
MD5 0bc9a5e989e139a897e47e60b29ccb8c
BLAKE2b-256 a6b694f8f8ec786dd19e93040fb71d5bc52a8d9dc139f004038b8dfdaee518b9

See more details on using hashes here.

File details

Details for the file kdp-1.7.0-py3-none-any.whl.

File metadata

  • Download URL: kdp-1.7.0-py3-none-any.whl
  • Upload date:
  • Size: 120.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.9 Linux/6.5.0-1022-azure

File hashes

Hashes for kdp-1.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 48f59bc158849e1e9bd2f5d0b82a60dfcc6e54dec123643408165d2788b3687e
MD5 44c3f98ffa135cb71d117185aca5f022
BLAKE2b-256 d029a95201667e0ba3edc2963c2f6d688abb9f4475663776974160ea48df870b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page