Data Preprocessing model based on Keras preprocessing layers
Project description
🌟 Keras Data Processor (KDP) - Powerful Data Preprocessing for TensorFlow
Transform your raw data into ML-ready features with just a few lines of code!
KDP provides a state-of-the-art preprocessing system built on TensorFlow Keras. It handles everything from feature normalization to advanced embedding techniques, making your ML pipelines faster, more robust, and easier to maintain.
✨ Key Features
- 🚀 Efficient Single-Pass Processing: Process all features in one go, dramatically faster than alternatives
- 🧠 Distribution-Aware Encoding: Automatically detects and optimally handles different data distributions
- 👁️ Tabular Attention: Captures complex feature interactions for better model performance
- 🔍 Feature Selection: Automatically identifies and focuses on the most important features
- 🔄 Feature-wise Mixture of Experts: Specialized processing for different feature types
- 📦 Production-Ready: Deploy your preprocessing along with your model as a single unit
🚀 Quick Installation
# Using pip
pip install keras-data-processor
# Using Poetry
poetry add keras-data-processor
📋 Simple Example
from kdp import PreprocessingModel, FeatureType
# Define your features
features_specs = {
"age": FeatureType.FLOAT_NORMALIZED,
"income": FeatureType.FLOAT_RESCALED,
"occupation": FeatureType.STRING_CATEGORICAL,
"description": FeatureType.TEXT
}
# Create and build the preprocessor
preprocessor = PreprocessingModel(
path_data="data/my_data.csv",
features_specs=features_specs,
# Enable advanced features
use_distribution_aware=True,
tabular_attention=True
)
result = preprocessor.build_preprocessor()
model = result["model"]
# Use the preprocessor with your data
processed_features = model(input_data)
📚 Comprehensive Documentation
We've built an extensive documentation system to help you get the most from KDP:
Core Guides
- 🚀 Quick Start Guide - Get up and running in minutes
- 📊 Feature Processing - Learn about all supported feature types
- 🧙♂️ Auto-Configuration - Let KDP configure itself for your data
Advanced Topics
- 📈 Distribution-Aware Encoding - Smart handling of different distributions
- 👁️ Tabular Attention - Capture complex feature interactions
- 🔢 Advanced Numerical Embeddings - Rich representations for numbers
- 🤖 Transformer Blocks - Apply transformer architecture to tabular data
- 🎯 Feature Selection - Focus on what matters in your data
- 🧠 Feature-wise Mixture of Experts - Specialized processing per feature
Integration & Performance
- 🔗 Integration Guide - Use KDP with existing ML pipelines
- 🚀 Tabular Optimization - Supercharge your preprocessing
- 📈 Performance Tips - Handling large datasets efficiently
Background & Resources
- 💡 Motivation - Why we built KDP
- 🤝 Contributing - Help improve KDP
🖼️ Model Architecture
Your preprocessing pipeline is built as a Keras model that can be used independently or as the first layer of any model:
📊 Performance
KDP outperforms alternative preprocessing approaches, especially as data size increases:
🤝 Contributing
We welcome contributions! Please check out our Contributing Guide for guidelines on how to proceed.
🛠️ Development Tools
KDP includes tools to help developers:
- Documentation Generation: Automatically generate API docs from docstrings
- Model Diagram Generation: Visualize model architectures with
make generate_doc_content
or run:python scripts/generate_model_diagrams.py
This creates diagram images indocs/features/imgs/models/
for all feature types and configurations.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- The TensorFlow and Keras teams for their amazing work
- All contributors who help make KDP better
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file kdp-1.11.0.tar.gz
.
File metadata
- Download URL: kdp-1.11.0.tar.gz
- Upload date:
- Size: 214.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.11.12 Linux/6.11.0-1014-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
3fe6bc87ff55bed8074f81371af6fdcd09fbd3763769554c8894a1cd5e4fec32
|
|
MD5 |
48ddd1e32ce432e351e65a82f6f70981
|
|
BLAKE2b-256 |
b335edc5277f6921eac34245eccf0a95cd10509965fb1d726e086eebcff04be6
|
File details
Details for the file kdp-1.11.0-py3-none-any.whl
.
File metadata
- Download URL: kdp-1.11.0-py3-none-any.whl
- Upload date:
- Size: 133.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.11.12 Linux/6.11.0-1014-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
b19ffd05cd9192a17c7054e177984c28f3f54249c53c7a717e5aaebda53eb494
|
|
MD5 |
72791b074400467f97a593d0b00861a9
|
|
BLAKE2b-256 |
0efa71bf56a2ed872947f41c4562837b3003ff6a034591a61250976f9c32066f
|