Training stability tools for synthetic demo
Project description
🌊 CandorFlow
Early Warning System for Training Instabilities
⚠️ Important Notice
This repository contains a SIMPLIFIED, PUBLIC DEMONSTRATION of CandorFlow concepts.
This is NOT the full proprietary system. Many advanced features, algorithms, and optimizations are intentionally excluded. See What Is NOT Included for details.
📖 Overview
CandorFlow is a training stability monitoring and intervention system designed to detect and prevent neural network training instabilities before they cause divergence.
This public repository demonstrates:
- A simplified stability metric λ(t) based on gradient variance
- Basic threshold-based monitoring
- Automatic checkpoint rollback on instability detection
- Learning rate reduction for recovery
- Minimal working examples with toy models
What is λ(t)?
The lambda metric λ(t) is a stability indicator that tracks training health over time. In this simplified demo, it measures gradient norm variance as a proxy for instability.
High λ(t) → Training is becoming unstable
Low λ(t) → Training is stable
🎯 Features in This Demo
✅ What This Repo Contains (Safe/Public Demo)
- Simplified λ(t) metric: Gradient norm variance-based instability detection
- Basic stability controller: Threshold monitoring with rollback capabilities
- Checkpoint management: Automatic saving and restoration
- Learning rate adaptation: Halving on instability detection
- Minimal training loop: Toy example with intentional instability
- Visualization tools: Plot λ(t) curves and stability phases
- Jupyter notebook: Interactive demo with explanations
- Reproducible examples: Fully runnable on CPU or GPU
🚫 What Is NOT Included (Proprietary)
The full CandorFlow system includes many advanced features that are NOT in this public demo:
Core Algorithms
- ❌ Universal scaling law for λ(t)
- ❌ Reflexive ridge equation and closed-form solutions
- ❌ Cross-domain invariants (works across NLP, vision, RL, etc.)
- ❌ Jacobian spectral analysis for stability prediction
- ❌ Multi-signal fusion (loss, gradients, activations, etc.)
Advanced Control
- ❌ Real-time stability engine with predictive modeling
- ❌ Reflexive decay algorithms for adaptive intervention
- ❌ Temporal smoothing with active inference
- ❌ Dynamic threshold adaptation based on training phase
- ❌ HPC-optimized control loops for large-scale training
Domain Extensions
- ❌ ECG anomaly detection applications
- ❌ Earthquake early warning systems
- ❌ Financial market stability monitoring
- ❌ General-purpose time series instability detection
Performance
- ❌ Production-grade optimizations for minimal overhead
- ❌ Distributed training integration (DeepSpeed, FSDP, etc.)
- ❌ Hardware acceleration (CUDA kernels, etc.)
For access to the full proprietary system, please contact us.
🚀 Installation
Prerequisites
- Python 3.8 or higher
- pip package manager
Option 1: Install from GitHub (Recommended)
Install CandorFlow directly from the repository:
pip install git+https://github.com/CandorSystem/CandorFlow.git
After installation, you can import CandorFlow from anywhere:
from candorflow import compute_lambda, StabilityController
from candorflow.demo import run_demo, plot_results
Option 2: Development Installation
For contributing, modifying the code, or running examples from the repository:
git clone https://github.com/CandorSystem/CandorFlow.git
cd CandorFlow
pip install -e .
This installs the package in editable mode, so changes to the source code are immediately reflected.
Option 3: Manual Setup (Not Recommended)
If you prefer not to install the package:
git clone https://github.com/CandorSystem/CandorFlow.git
cd CandorFlow
pip install -r requirements.txt
python examples/run_demo.py # Must run from repo directory
Note: With this approach, you'll need to add the repository to your Python path or run scripts from the repository root directory.
💻 Usage
Quick Start: Run the Training Demo
After installation, run the demo:
from candorflow.demo import run_demo, plot_results
# Run the demo
results = run_demo()
# Generate plots
plot_results(results)
Or use the command-line wrapper:
python examples/run_demo.py
This will:
- Create a small MLP neural network
- Train it on synthetic data
- Compute λ(t) at each step
- Inject synthetic instability spike at step 30
- Demonstrate automatic detection and rollback
- Generate two plots:
plots/lambda_curve.png- λ(t) over time with intervention markersplots/stability_phases.png- Color-coded stability zones
Expected output:
✓ Saved plots to plots/
- lambda_curve.png
- stability_phases.png
Colab Integration
The demo is designed for easy Colab integration:
!pip install candorflow
from candorflow.demo import run_demo, plot_results
results = run_demo(steps=50, spike_step=30, threshold=2.0)
plot_results(results)
All training logic is contained in candorflow.demo - no need to write training loops in Colab!
📁 Repository Structure
CandorFlow/
│
├── README.md # This file
├── pyproject.toml # Package configuration (pip install)
├── requirements.txt # Python dependencies
├── LICENSE # MIT License
│
├── candorflow/ # Main package
│ ├── __init__.py # Public API (compute_lambda, StabilityController)
│ ├── demo.py # Complete training demo (all logic here)
│ ├── lambda_metric.py # Simplified λ(t) computation
│ ├── stability_controller.py # Basic monitoring & intervention
│ ├── utils.py # Checkpoint and logging utilities
│ └── version.py # Version information
│
├── examples/ # Runnable demos
│ └── run_demo.py # Thin wrapper to run demo
│
├── notebooks/ # Jupyter notebooks
│ └── CandorFlow_Demo.ipynb # Interactive tutorial
│
└── plots/ # Output directory for plots
└── (generated files)
🔬 How It Works (Simplified Version)
1. Monitor Training with λ(t)
from candorflow import compute_lambda, StabilityController
# During training loop
lambda_value = compute_lambda(
model=model,
loss=loss,
gradient_history=gradient_history
)
2. Automatic Intervention
controller = StabilityController(threshold=2.0)
action = controller.update(
lambda_value=lambda_value,
model=model,
optimizer=optimizer,
step=step
)
if action["action"] == "rollback":
print("Instability detected - rolling back to stable checkpoint")
3. Training Continues Safely
The controller automatically:
- Saves checkpoints when training is stable
- Detects when λ(t) exceeds threshold
- Rolls back to last stable state
- Reduces learning rate
- Resumes training
📊 Example Results
After running the demo, you'll see plots like this:
Lambda Curve with Interventions:
- Blue line: λ(t) stability metric over time
- Purple dashed line: Instability threshold
- Orange markers: Rollback + LR reduction events
- Red markers: Warnings
Stability Phases:
- Green zone: Stable training
- Orange zone: Warning (approaching threshold)
- Red zone: Unstable (intervention triggered)
🧪 Running Tests
The demo includes built-in validation:
# Run training demo (includes self-checks)
python examples/demo_training_loop.py
# Generate plots (validates results)
python examples/demo_plots.py
📚 Documentation
API Reference
compute_lambda_metric(model, loss, history_window=10, gradient_history=None)
Compute simplified λ(t) stability metric.
Parameters:
model(torch.nn.Module): Neural network modelloss(torch.Tensor): Current loss value (with grad_fn)history_window(int): Number of past gradient norms to trackgradient_history(list): List to store gradient history (modified in-place)
Returns:
lambda_value(float): Stability metric (higher = more unstable)
StabilityController(threshold, checkpoint_dir, lr_reduction_factor)
Training stability monitor and intervention system.
Parameters:
threshold(float): λ(t) value above which to trigger interventioncheckpoint_dir(str): Directory for saving checkpointslr_reduction_factor(float): Factor to reduce LR by (default: 0.5)
Methods:
update(lambda_value, model, optimizer, step): Update controller and take action if neededget_summary(): Get training statistics
🤝 Contributing
This is a demonstration repository. Contributions are welcome for:
- Bug fixes in demo code
- Documentation improvements
- Additional visualization examples
- Educational content
Note: This repo intentionally excludes proprietary algorithms. Please do not submit PRs attempting to implement advanced features from the full system.
📧 Contact
For questions about this demo:
- Open an issue on GitHub
For inquiries about the full proprietary CandorFlow system:
- Email: [your-email@example.com]
- Website: [https://candorflow.example.com]
- Patents: [Patent application numbers]
📄 License
This simplified demonstration code is released under the MIT License. See LICENSE for details.
Important: The full CandorFlow system, including its proprietary algorithms and commercial applications, is NOT covered by this license. Please contact us for commercial licensing.
📖 Citation
If you use this demo code in your research or project, please cite:
@software{candorflow2025,
title={CandorFlow: Training Stability Monitoring System},
author={[Your Name]},
year={2025},
url={https://github.com/yourusername/CandorFlow},
note={Simplified public demonstration version}
}
🙏 Acknowledgments
This simplified demo is provided for educational purposes to demonstrate basic concepts in training stability monitoring.
The full CandorFlow system represents significant research and development investment and is protected by pending patents.
⭐ Star History
If you find this demo helpful, please consider starring the repository!
Built for responsible, and safe AI.
Project Support & Affiliations
CandorFlow and the Candor Systems project are supported by several leading industry startup programs:
These affiliations provide cloud credits, compute resources, and technical support for ongoing research and development.
Note: These affiliations indicate participation in early-stage startup support programs and do not imply endorsement of CandorFlow's algorithms or proprietary systems.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file candorflow-0.1.2.tar.gz.
File metadata
- Download URL: candorflow-0.1.2.tar.gz
- Upload date:
- Size: 16.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c0c5257b26913a6111c47b9832e2a2d34a8024117077c598f64799bbf91b846
|
|
| MD5 |
940fe72ee7692d567e92f7f4d3ccf423
|
|
| BLAKE2b-256 |
dced5c8032cd19538b00a72f03a6e751cb9c75d5df364a3719c164049d84402e
|
File details
Details for the file candorflow-0.1.2-py3-none-any.whl.
File metadata
- Download URL: candorflow-0.1.2-py3-none-any.whl
- Upload date:
- Size: 14.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
41a81d90d4308c49d008a2e3f26ce4c14b0d8f6ce42bff6c7eadb5557386f61c
|
|
| MD5 |
7d08627c52ca88199c7c98f741e19e45
|
|
| BLAKE2b-256 |
1d48b43d81ea2676d22ff5c920b412ceedbea2f252f7cfea995069c8cb6e8e56
|