A comprehensive Python package for intelligent serialization that handles complex data types with ease.
Project description
๐ datason
A comprehensive Python package for intelligent serialization that handles complex data types with ease
๐ฏ Perfect Drop-in Replacement for Python's JSON Module with enhanced features for complex data types and ML workflows. Zero migration effort - your existing JSON code works immediately with smart datetime parsing, type preservation, and advanced serialization capabilities.
๐ Works exactly like
jsonmodule: Useimport datason.json as jsonfor perfect compatibility, orimport datasonfor enhanced features like automatic datetime parsing and ML type support.
โจ Features
๐ฏ Drop-in JSON Replacement
- ๐ Perfect Compatibility: Works exactly like Python's
jsonmodule - zero code changes needed - ๐ Enhanced by Default: Main API provides smart datetime parsing and type detection automatically
- โก Dual API Strategy: Choose stdlib compatibility (
datason.json) or enhanced features (datason) - ๐ ๏ธ Zero Migration: Existing
json.loads/dumpscode works immediately with optional enhancements
๐ง Intelligent Processing
- ๐ง Smart Type Detection: Automatically handles pandas DataFrames, NumPy arrays, datetime objects, and more
- ๐ Bidirectional: Serialize to JSON and deserialize back to original objects with perfect fidelity
- ๐ Datetime Intelligence: Automatic ISO 8601 string parsing across Python 3.8-3.11+
- ๐ก๏ธ Type Safety: Preserves data types and structure integrity with guaranteed round-trip serialization
๐ ML/AI Optimized
- ๐ ML Framework Support: Production-ready support for 10+ ML frameworks with unified architecture
- โก High Performance: Sub-millisecond serialization optimized for ML workloads
- ๐ฏ Simple & Direct API: Intention-revealing functions (
dump_api,dump_ml,dump_secure,dump_fast) with automatic optimization - ๐ Progressive Loading: Choose your success rate -
load_basic(60-70%),load_smart(80-90%),load_perfect(100%) - ๐๏ธ Production Ready: Enterprise-grade ML serving with monitoring, A/B testing, and security
๐ง Developer Experience
- ๐ Extensible: Easy to add custom serializers for your own types
- ๐ฆ Zero Dependencies: Core functionality works without additional packages
- ๐ Integrity Verification: Hash, sign, and verify objects for compliance workflows
- ๐ File Operations: Save and load JSON/JSONL files with compression support
๐ค ML Framework Support
datason provides production-ready integration for major ML frameworks with consistent serialization:
Core ML Libraries
- ๐ผ Pandas - DataFrames with schema preservation
- ๐ข NumPy - Arrays with dtype and shape preservation
- ๐ฅ PyTorch - Tensors with exact dtype/shape reconstruction
- ๐ง TensorFlow/Keras - Models with architecture and weights
- ๐ฒ Scikit-learn - Fitted models with parameters
Advanced ML Frameworks
- ๐ CatBoost - Models with fitted state and parameter extraction
- ๐ Optuna - Studies with trial history and hyperparameter tracking
- ๐ Plotly - Interactive figures with data, layout, and configuration
- โก Polars - High-performance DataFrames with schema preservation
- ๐ฏ XGBoost - Gradient boosting models (via scikit-learn interface)
ML Serving Platforms
- ๐ฑ BentoML - Production services with A/B testing and monitoring
- โ๏ธ Ray Serve - Scalable deployment with autoscaling
- ๐ฌ MLflow - Model registry integration with experiment tracking
- ๐จ Streamlit - Interactive dashboards with real-time data
- ๐ญ Gradio - ML demos with consistent data handling
- โก FastAPI - Custom APIs with validation and rate limiting
- โธ๏ธ Seldon Core/KServe - Kubernetes-native model serving
Universal Pattern: All frameworks use the same
get_api_config()for consistent UUID and datetime handling across your entire ML pipeline.
๐ Python Version Support
datason officially supports Python 3.8+ and is actively tested on:
- โ Python 3.8 - Minimum supported version (core functionality)
- โ Python 3.9 - Full compatibility
- โ Python 3.10 - Full compatibility
- โ Python 3.11 - Full compatibility (primary development version)
- โ Python 3.12 - Full compatibility
- โ Python 3.13 - Latest stable version (core features only; many ML libraries still releasing wheels)
Compatibility Testing
We maintain compatibility through:
- Automated CI testing on all supported Python versions with strategic coverage:
- Python 3.8: Core functionality validation (minimal dependencies)
- Python 3.9: Data science focus (pandas integration)
- Python 3.10: ML focus (scikit-learn, scipy)
- Python 3.11: Full test suite (primary development version)
- Python 3.12: Full test suite
- Python 3.13: Core serialization tests only (latest stable)
- Core functionality tests ensuring basic serialization works on Python 3.8+
- Dependency compatibility checks for optional ML/data science libraries
- Runtime version validation with helpful error messages
Note: While core functionality works on Python 3.8, some optional dependencies (like latest ML frameworks) may require newer Python versions. The package will still work - you'll just have fewer optional features available.
Python 3.13 Caution: Many machine learning libraries have not yet released official 3.13 builds. Datason runs on Python 3.13, but only with core serialization features until those libraries catch up.
Python 3.8 Limitations
Python 3.8 users should be aware:
- โ Core serialization - Full support
- โ Basic types - datetime, UUID, decimal, etc.
- โ Pandas/NumPy - Basic DataFrame and array serialization
- โ ๏ธ Advanced ML libraries - Some may require Python 3.9+
- โ ๏ธ Latest features - Some newer configuration options may have limited support
We recommend Python 3.9+ for the best experience with all features.
๐ Drop-in JSON Replacement
Replace Python's json module with zero code changes and get enhanced features automatically!
Perfect Compatibility Mode
# Your existing code works unchanged
import datason.json as json
# Exact same API as Python's json module
data = json.loads('{"timestamp": "2024-01-01T00:00:00Z", "value": 42}')
# Returns: {'timestamp': '2024-01-01T00:00:00Z', 'value': 42}
json_string = json.dumps({"key": "value"}, indent=2)
# Works exactly like json.dumps() with all parameters
Enhanced Mode (Automatic Improvements)
# Just use the main datason module for enhanced features
import datason
# Smart datetime parsing automatically enabled
data = datason.loads('{"timestamp": "2024-01-01T00:00:00Z", "value": 42}')
# Returns: {'timestamp': datetime.datetime(2024, 1, 1, 0, 0, tzinfo=timezone.utc), 'value': 42}
# Enhanced serialization with dict output for chaining
result = datason.dumps({"timestamp": datetime.now(), "data": [1, 2, 3]})
# Returns: dict (not string) with smart type handling
Migration Strategy
# Phase 1: Drop-in replacement (zero risk)
import datason.json as json # Perfect compatibility
# Phase 2: Enhanced features when ready
import datason # Smart datetime parsing, ML support, etc.
# Phase 3: Advanced features as needed
datason.dump_ml(ml_model) # ML-optimized serialization
datason.dump_secure(data) # Automatic PII redaction
datason.load_perfect(data, template) # 100% accurate reconstruction
Zero Risk Migration: Start with
datason.jsonfor perfect compatibility, then gradually adopt enhanced features when you need them.
๐โโ๏ธ Quick Start
Installation
pip install datason
Production ML Serving - Simple & Direct
import datason as ds
import uuid
from datetime import datetime
# ML prediction data with UUIDs and complex types
prediction_data = {
"request_id": uuid.uuid4(),
"timestamp": datetime.now(),
"features": {"feature1": 1.0, "feature2": 2.0},
"model_version": "1.0.0"
}
# Simple, direct API with automatic optimizations
api_response = ds.dump_api(prediction_data) # Perfect for web APIs
# โ
UUIDs become strings automatically - no more Pydantic errors!
# ML-optimized serialization
import torch
model_data = {"model": torch.nn.Linear(10, 1), "weights": torch.randn(10, 1)}
ml_serialized = ds.dump_ml(model_data) # Automatic ML optimization
# Security-focused with automatic PII redaction
user_data = {"name": "Alice", "email": "alice@example.com", "ssn": "123-45-6789"}
secure_data = ds.dump_secure(user_data) # Automatic PII redaction
# Works across ALL ML frameworks with same simple pattern
import bentoml
from bentoml.io import JSON
@svc.api(input=JSON(), output=JSON())
def predict(input_data: dict) -> dict:
features = ds.load_smart(input_data) # 80-90% success rate
prediction = model.predict(features)
return ds.dump_api({"prediction": prediction}) # Clean API response
Simple & Direct API
import datason as ds
from datetime import datetime
import pandas as pd
import numpy as np
# Complex nested data structure
data = {
"timestamp": datetime.now(),
"dataframe": pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}),
"array": np.array([1, 2, 3, 4, 5]),
"nested": {
"values": [1, 2, {"inner": datetime.now()}]
}
}
# Simple API with automatic optimization
api_data = ds.dump_api(data) # Web APIs (UUIDs as strings, clean JSON)
ml_data = ds.dump_ml(data) # ML optimized (framework detection)
secure_data = ds.dump_secure(data) # Security focused (PII redaction)
fast_data = ds.dump_fast(data) # Performance optimized
# Progressive loading - choose your success rate
basic_result = ds.load_basic(api_data) # 60-70% success, fastest
smart_result = ds.load_smart(api_data) # 80-90% success, balanced
perfect_result = ds.load_perfect(api_data, template=data) # 100% with template
# Traditional API still available
serialized = ds.serialize(data)
restored = ds.deserialize(serialized)
Advanced Options - Composable & Flexible
import datason as ds
# Use the main dump() function with options for complex scenarios
large_sensitive_ml_data = {
"model": trained_model,
"user_data": {"email": "user@example.com", "preferences": {...}},
"large_dataset": huge_numpy_array
}
# Combine multiple optimizations
result = ds.dump(
large_sensitive_ml_data,
secure=True, # Enable PII redaction
ml_mode=True, # Optimize for ML objects
chunked=True # Memory-efficient processing
)
# Or use specialized functions for simple cases
api_data = ds.dump_api(response_data) # Web API optimized
ml_data = ds.dump_ml(model_data) # ML optimized
secure_data = ds.dump_secure(sensitive_data) # Security focused
fast_data = ds.dump_fast(performance_data) # Speed optimized
# Progressive loading with clear success rates
basic_result = ds.load_basic(json_data) # 60-70% success, fastest
smart_result = ds.load_smart(json_data) # 80-90% success, balanced
perfect_result = ds.load_perfect(json_data, template) # 100% with template
# API discovery and help
help_info = ds.help_api() # Get guidance on function selection
๐๏ธ Production Architecture
datason provides a complete ML serving architecture with visual documentation:
- ๐ฏ Universal Integration Pattern: Single configuration works across all frameworks
- ๐ Comprehensive Monitoring: Prometheus metrics, health checks, and observability
- ๐ Enterprise Security: Input validation, rate limiting, and PII redaction
- โก Performance Optimized: Sub-millisecond serialization with caching support
- ๐ A/B Testing: Framework for testing multiple model versions
- ๐ Production Examples: Ready-to-deploy BentoML, Ray Serve, and FastAPI services
Quick Architecture Overview
graph LR
A[Client Apps] --> B[API Gateway]
B --> C[ML Services<br/>BentoML/Ray/FastAPI]
C --> D[Models<br/>CatBoost/Keras/etc]
C --> E[Cache<br/>Redis]
C --> F[DB<br/>PostgreSQL]
style C fill:#e1f5fe,stroke:#01579b,stroke-width:3px
style D fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
See Full Documentation: Complete architecture diagrams and production patterns in
docs/features/model-serving/
๐ Documentation
Core Documentation
For full documentation, examples, and API reference, visit: https://datason.readthedocs.io
ML Serving Guides
- ๐๏ธ Architecture Overview - Complete system architecture with Mermaid diagrams
- ๐ Model Serving Integration - Production-ready examples for all major frameworks
- ๐ฏ Production Patterns - Advanced deployment strategies and best practices
Production Examples
- ๐ฑ Advanced BentoML Integration - Enterprise service with A/B testing and monitoring
- ๐ Production ML Serving Guide - Complete implementation with security and observability
Quick Start: Run
python examples/production_ml_serving_guide.pyto see all features in action!
๐ค Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datason-0.13.0.tar.gz.
File metadata
- Download URL: datason-0.13.0.tar.gz
- Upload date:
- Size: 511.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac3ba69586501fe8d6cf039d75ecaf5659bc16ff089d71f60d73e398505900dd
|
|
| MD5 |
05540ca80059274d546231c44fe7f658
|
|
| BLAKE2b-256 |
ff9a3f80f36831fd002a4cd02f609c940b50a697b9bd1f533595a8063f9057b9
|
Provenance
The following attestation bundles were made for datason-0.13.0.tar.gz:
Publisher:
publish.yml on danielendler/datason
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
datason-0.13.0.tar.gz -
Subject digest:
ac3ba69586501fe8d6cf039d75ecaf5659bc16ff089d71f60d73e398505900dd - Sigstore transparency entry: 430591018
- Sigstore integration time:
-
Permalink:
danielendler/datason@12de9b362d9e813ae8c3c423febd8f0b1357bfde -
Branch / Tag:
refs/tags/v0.13.0 - Owner: https://github.com/danielendler
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@12de9b362d9e813ae8c3c423febd8f0b1357bfde -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file datason-0.13.0-py3-none-any.whl.
File metadata
- Download URL: datason-0.13.0-py3-none-any.whl
- Upload date:
- Size: 130.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7ea24bd73ec04a9ea0a23b15061c94e4f62838c465f8205c4bf5e2a91de458cc
|
|
| MD5 |
e616428710dd50610a0953103cc509e7
|
|
| BLAKE2b-256 |
18041dc0c30a511736f075f37162573b23da7102c643419865a2a96fa27434f1
|
Provenance
The following attestation bundles were made for datason-0.13.0-py3-none-any.whl:
Publisher:
publish.yml on danielendler/datason
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
datason-0.13.0-py3-none-any.whl -
Subject digest:
7ea24bd73ec04a9ea0a23b15061c94e4f62838c465f8205c4bf5e2a91de458cc - Sigstore transparency entry: 430591034
- Sigstore integration time:
-
Permalink:
danielendler/datason@12de9b362d9e813ae8c3c423febd8f0b1357bfde -
Branch / Tag:
refs/tags/v0.13.0 - Owner: https://github.com/danielendler
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@12de9b362d9e813ae8c3c423febd8f0b1357bfde -
Trigger Event:
workflow_dispatch
-
Statement type: