DSF AML SDK — Automated ML Robustness & Failure Correction Framework
Project description
DSF AML SDK
Automated ML Robustness & Training Data Generation
Generate critical training variants from production failures and edge cases. Accelerate model retraining with automatically curated datasets.
🎯 Primary Use Cases
1. Production Failure Recovery
Challenge: ML/LLM models fail on edge cases. Manual correction is slow.
Solution: Generate critical variants from failures for rapid retraining.
from dsf_aml_sdk import AMLSDK
sdk = AMLSDK(license_key='your_key', tier='professional')
# Production failure detected
failed_case = {'metric_a': 0.60, 'metric_b': 500, 'metric_c': 0.20}
# Generate variants
variants = sdk.generate_variants(
seed=failed_case,
config=your_config,
count=20
)
# Use variants['samples'] for retraining
Output: Labeled data points similar to failure case for model robustness improvement.
2. Preventive Dataset Curation
Challenge: Models trained on clean data fail on boundary cases.
Solution: Pre-generate datasets focused on decision boundaries.
# Identify high-impact regions
seeds = sdk.identify_high_impact_regions(
dataset=training_data,
config=config,
focus_percent=0.1
)
# Generate boundary variants
boundary_data = sdk.generate_boundary_variants(
config=config,
source_data=training_data,
variants_per_seed=10
)
# Train with boundary_data
3. Training Data Generation
Challenge: Creating labeled datasets is expensive.
Solution: Generate synthetic labeled datasets at scale.
# Generate labeled samples
result = sdk.generate_training_data(config, samples=1000)
# Export (Enterprise)
dataset = sdk.export_dataset()
# Train your models with generated data
📦 Installation
pip install dsf-aml-sdk
🧩 Quick Start
from dsf_aml_sdk import AMLSDK
sdk = AMLSDK(license_key='your_key', tier='professional')
# Define evaluation config
config = {
'metric_a': {
'reference_value': 0.95,
'params': {
'importance': 2.5,
'sensitivity': 2.0
}
},
'metric_b': {
'reference_value': 100,
'params': {
'importance': 1.8,
'sensitivity': 1.5
}
}
}
# Report failure
failed_input = {'metric_a': 0.60, 'metric_b': 500}
# Generate corrections
fix = sdk.generate_variants(failed_input, config, count=20)
print(f"Generated {len(fix['samples'])} variants")
📊 Execution Metrics
Operations return performance metrics:
{
"tier": "professional",
"evaluations": 62,
"threshold": 0.6698,
"persistence": "active",
"statistics": {
"avg": 0.7296,
"min": 0.5217,
"max": 0.8467
}
}
🆚 Tier Comparison
| Feature | Community | Professional | Enterprise |
|---|---|---|---|
| Variant Generation | Limited | ✅ | ✅ |
| Preventive Datasets | Limited | ✅ | ✅ |
| Batch Operations | ❌ | ✅ (≤1000) | ✅ (≤1000) |
| Data Export | ❌ | ✅ | ✅ |
| Full Pipeline | ❌ | ❌ | ✅ |
📖 Core Methods
Variant Generation
sdk.generate_variants(seed: dict, config, count=20) → dict
Returns:
{
"status": "completed",
"total": 20,
"samples": [...],
"metrics": {...}
}
High-Impact Region Identification
sdk.identify_high_impact_regions(dataset, config, focus_percent=0.1) → dict
sdk.generate_boundary_variants(config, source_data, **kwargs) → dict
Training Data Generation
sdk.generate_training_data(config, samples=1000) → dict
sdk.export_dataset() → dict # Enterprise only
Evaluation
# Single evaluation
result = sdk.evaluate(data, config)
# Batch evaluation
results = sdk.batch_evaluate(data_points, config)
🔧 Configuration Structure
config = {
"feature_name": {
"reference_value": <target_value>,
"params": {
"importance": <float>, # Feature weight
"sensitivity": <float> # Deviation tolerance
}
}
}
Example Configuration
config = {
'metric_primary': {
'reference_value': 650,
'params': {
'importance': 2.5,
'sensitivity': 2.0
}
},
'metric_secondary': {
'reference_value': 60000,
'params': {
'importance': 2.0,
'sensitivity': 1.8
}
}
}
🛠️ Complete Workflow
import pandas as pd
from dsf_aml_sdk import AMLSDK
# Initialize
sdk = AMLSDK(license_key='your_key', tier='professional')
# Load data
df = pd.read_csv('data.csv')
data = df[['metric_a', 'metric_b', 'metric_c']].head(100).to_dict('records')
# Config
config = {
'metric_a': {
'reference_value': 650,
'params': {'importance': 2.5, 'sensitivity': 2.0}
},
'metric_b': {
'reference_value': 60000,
'params': {'importance': 2.0, 'sensitivity': 1.8}
}
}
# 1. Fix failure
failed = {'metric_a': 580, 'metric_b': 35000}
fix = sdk.generate_variants(seed=failed, config=config, count=10)
# 2. Preventive dataset
seeds = sdk.identify_high_impact_regions(data[:50], config, focus_percent=0.15)
boundary = sdk.generate_boundary_variants(config, data[:50], variants_per_seed=20)
# 3. Generate training data
result = sdk.generate_training_data(config, samples=200)
# 4. Evaluate
test = data[0]
result = sdk.evaluate(test, config)
⚠️ Important Notes
Client Responsibility:
Clients must validate model performance and compliance with applicable regulations. This SDK is a data generation tool and does not make autonomous decisions.
Data Processing:
All generation logic executes server-side. SDK exposes configuration interface only.
Generated Data:
Synthetic data is based on client-provided configurations and source datasets. Clients control all inputs and validation.
📞 Support
Licensing: contacto@dsfuptech.cloud
Technical Docs: Available under NDA
Enterprise: contacto@dsfuptech.cloud
📋 Credits
Technology Architect: Jaime Alexander Jimenez
© 2025 DSF UpTech. All rights reserved.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dsf_aml_sdk-2.2.0.tar.gz.
File metadata
- Download URL: dsf_aml_sdk-2.2.0.tar.gz
- Upload date:
- Size: 18.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4bafce6fa6d574f1d1ca9aa81a5bcbbb2a7ada5eb03b18873537d044dd99cbaa
|
|
| MD5 |
21a594a6c97dc54da9a9e9c1ea92a7f4
|
|
| BLAKE2b-256 |
6b0456e535ca759ddd3988d92eef95a7835082a69d59b570fb7cfffde544fe8a
|
File details
Details for the file dsf_aml_sdk-2.2.0-py3-none-any.whl.
File metadata
- Download URL: dsf_aml_sdk-2.2.0-py3-none-any.whl
- Upload date:
- Size: 16.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
36988b2aa647306a75d8f20fe891e10dd94a8c1ad6fb78dbc9b27bcc5e6340f3
|
|
| MD5 |
3c2b54b6fbc2b526cd0f54a3876a333d
|
|
| BLAKE2b-256 |
4e001b8ddca169f3e299ec6211d35d0afcfaca6ac7543c918584d1481f10ad45
|