Training stability tools for synthetic demo

Project description

🌊 CandorFlow

Early Warning System for Training Instabilities

⚠️ Important Notice

This repository contains a SIMPLIFIED, PUBLIC DEMONSTRATION of CandorFlow concepts.

This is NOT the full proprietary system. Many advanced features, algorithms, and optimizations are intentionally excluded. See What Is NOT Included for details.

📖 Overview

CandorFlow is a training stability monitoring and intervention system designed to detect and prevent neural network training instabilities before they cause divergence.

This public repository demonstrates:

A simplified stability metric λ(t) based on gradient variance
Basic threshold-based monitoring
Automatic checkpoint rollback on instability detection
Learning rate reduction for recovery
Minimal working examples with toy models

What is λ(t)?

The lambda metric λ(t) is a stability indicator that tracks training health over time. In this simplified demo, it measures gradient norm variance as a proxy for instability.

High λ(t) → Training is becoming unstable
Low λ(t) → Training is stable

🎯 Features in This Demo

✅ What This Repo Contains (Safe/Public Demo)

Simplified λ(t) metric: Gradient norm variance-based instability detection
Basic stability controller: Threshold monitoring with rollback capabilities
Checkpoint management: Automatic saving and restoration
Learning rate adaptation: Halving on instability detection
Minimal training loop: Toy example with intentional instability
Visualization tools: Plot λ(t) curves and stability phases
Jupyter notebook: Interactive demo with explanations
Reproducible examples: Fully runnable on CPU or GPU

🚫 What Is NOT Included (Proprietary)

The full CandorFlow system includes many advanced features that are NOT in this public demo:

Core Algorithms

❌ Universal scaling law for λ(t)
❌ Reflexive ridge equation and closed-form solutions
❌ Cross-domain invariants (works across NLP, vision, RL, etc.)
❌ Jacobian spectral analysis for stability prediction
❌ Multi-signal fusion (loss, gradients, activations, etc.)

Advanced Control

❌ Real-time stability engine with predictive modeling
❌ Reflexive decay algorithms for adaptive intervention
❌ Temporal smoothing with active inference
❌ Dynamic threshold adaptation based on training phase
❌ HPC-optimized control loops for large-scale training

Domain Extensions

❌ ECG anomaly detection applications
❌ Earthquake early warning systems
❌ Financial market stability monitoring
❌ General-purpose time series instability detection

Performance

❌ Production-grade optimizations for minimal overhead
❌ Distributed training integration (DeepSpeed, FSDP, etc.)
❌ Hardware acceleration (CUDA kernels, etc.)

For access to the full proprietary system, please contact us.

🚀 Installation

Prerequisites

Python 3.8 or higher
pip package manager

Option 1: Install from GitHub (Recommended)

Install CandorFlow directly from the repository:

pip install git+https://github.com/CandorSystem/CandorFlow.git

After installation, you can import CandorFlow from anywhere:

from candorflow import compute_lambda, StabilityController
from candorflow.demo import run_demo, plot_results

Option 2: Development Installation

For contributing, modifying the code, or running examples from the repository:

git clone https://github.com/CandorSystem/CandorFlow.git
cd CandorFlow
pip install -e .

This installs the package in editable mode, so changes to the source code are immediately reflected.

Option 3: Manual Setup (Not Recommended)

If you prefer not to install the package:

git clone https://github.com/CandorSystem/CandorFlow.git
cd CandorFlow
pip install -r requirements.txt
python examples/run_demo.py  # Must run from repo directory

Note: With this approach, you'll need to add the repository to your Python path or run scripts from the repository root directory.

💻 Usage

Quick Start: Run the Training Demo

After installation, run the demo:

from candorflow.demo import run_demo, plot_results

# Run the demo
results = run_demo()

# Generate plots
plot_results(results)

Or use the command-line wrapper:

python examples/run_demo.py

This will:

Create a small MLP neural network
Train it on synthetic data
Compute λ(t) at each step
Inject synthetic instability spike at step 30
Demonstrate automatic detection and rollback
Generate two plots:
- plots/lambda_curve.png - λ(t) over time with intervention markers
- plots/stability_phases.png - Color-coded stability zones

Expected output:

✓ Saved plots to plots/
  - lambda_curve.png
  - stability_phases.png

Colab Integration

The demo is designed for easy Colab integration:

!pip install candorflow

from candorflow.demo import run_demo, plot_results
results = run_demo(steps=50, spike_step=30, threshold=2.0)
plot_results(results)

All training logic is contained in candorflow.demo - no need to write training loops in Colab!

📁 Repository Structure

CandorFlow/
│
├── README.md                   # This file
├── pyproject.toml              # Package configuration (pip install)
├── requirements.txt            # Python dependencies
├── LICENSE                     # MIT License
│
├── candorflow/                 # Main package
│   ├── __init__.py            # Public API (compute_lambda, StabilityController)
│   ├── demo.py                # Complete training demo (all logic here)
│   ├── lambda_metric.py       # Simplified λ(t) computation
│   ├── stability_controller.py # Basic monitoring & intervention
│   ├── utils.py               # Checkpoint and logging utilities
│   └── version.py             # Version information
│
├── examples/                   # Runnable demos
│   └── run_demo.py            # Thin wrapper to run demo
│
├── notebooks/                  # Jupyter notebooks
│   └── CandorFlow_Demo.ipynb  # Interactive tutorial
│
└── plots/                      # Output directory for plots
    └── (generated files)

🔬 How It Works (Simplified Version)

1. Monitor Training with λ(t)

from candorflow import compute_lambda, StabilityController

# During training loop
lambda_value = compute_lambda(
    model=model,
    loss=loss,
    gradient_history=gradient_history
)

2. Automatic Intervention

controller = StabilityController(threshold=2.0)

action = controller.update(
    lambda_value=lambda_value,
    model=model,
    optimizer=optimizer,
    step=step
)

if action["action"] == "rollback":
    print("Instability detected - rolling back to stable checkpoint")

3. Training Continues Safely

The controller automatically:

Saves checkpoints when training is stable
Detects when λ(t) exceeds threshold
Rolls back to last stable state
Reduces learning rate
Resumes training

📊 Example Results

After running the demo, you'll see plots like this:

Lambda Curve with Interventions:

Blue line: λ(t) stability metric over time
Purple dashed line: Instability threshold
Orange markers: Rollback + LR reduction events
Red markers: Warnings

Stability Phases:

Green zone: Stable training
Orange zone: Warning (approaching threshold)
Red zone: Unstable (intervention triggered)

🧪 Running Tests

The demo includes built-in validation:

# Run training demo (includes self-checks)
python examples/demo_training_loop.py

# Generate plots (validates results)
python examples/demo_plots.py

📚 Documentation

API Reference

`compute_lambda_metric(model, loss, history_window=10, gradient_history=None)`

Compute simplified λ(t) stability metric.

Parameters:

model (torch.nn.Module): Neural network model
loss (torch.Tensor): Current loss value (with grad_fn)
history_window (int): Number of past gradient norms to track
gradient_history (list): List to store gradient history (modified in-place)

Returns:

lambda_value (float): Stability metric (higher = more unstable)

`StabilityController(threshold, checkpoint_dir, lr_reduction_factor)`

Training stability monitor and intervention system.

Parameters:

threshold (float): λ(t) value above which to trigger intervention
checkpoint_dir (str): Directory for saving checkpoints
lr_reduction_factor (float): Factor to reduce LR by (default: 0.5)

Methods:

update(lambda_value, model, optimizer, step): Update controller and take action if needed
get_summary(): Get training statistics

🤝 Contributing

This is a demonstration repository. Contributions are welcome for:

Bug fixes in demo code
Documentation improvements
Additional visualization examples
Educational content

Note: This repo intentionally excludes proprietary algorithms. Please do not submit PRs attempting to implement advanced features from the full system.

📧 Contact

For questions about this demo:

Open an issue on GitHub

For inquiries about the full proprietary CandorFlow system:

Email: [your-email@example.com]
Website: [https://candorflow.example.com]
Patents: [Patent application numbers]

📄 License

This simplified demonstration code is released under the MIT License. See LICENSE for details.

Important: The full CandorFlow system, including its proprietary algorithms and commercial applications, is NOT covered by this license. Please contact us for commercial licensing.

📖 Citation

If you use this demo code in your research or project, please cite:

@software{candorflow2025,
  title={CandorFlow: Training Stability Monitoring System},
  author={[Your Name]},
  year={2025},
  url={https://github.com/yourusername/CandorFlow},
  note={Simplified public demonstration version}
}

🙏 Acknowledgments

This simplified demo is provided for educational purposes to demonstrate basic concepts in training stability monitoring.

The full CandorFlow system represents significant research and development investment and is protected by pending patents.

⭐ Star History

If you find this demo helpful, please consider starring the repository!

Built for responsible, and safe AI.

Project Support & Affiliations

CandorFlow and the Candor Systems project are supported by several leading industry startup programs:

These affiliations provide cloud credits, compute resources, and technical support for ongoing research and development.

Note: These affiliations indicate participation in early-stage startup support programs and do not imply endorsement of CandorFlow's algorithms or proprietary systems.

Project details

Release history Release notifications | RSS feed

0.1.2

Nov 16, 2025

This version

0.1.1

Nov 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

candorflow-0.1.1.tar.gz (16.6 kB view details)

Uploaded Nov 16, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

candorflow-0.1.1-py3-none-any.whl (15.0 kB view details)

Uploaded Nov 16, 2025 Python 3

File details

Details for the file candorflow-0.1.1.tar.gz.

File metadata

Download URL: candorflow-0.1.1.tar.gz
Upload date: Nov 16, 2025
Size: 16.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for candorflow-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`d73bfd982a900b4f4a9a31da37bbf8eb379ad5a80bd78ad76bbe6f7da016a408`
MD5	`5f4df2c67f408b83b8573a8a73e62f3b`
BLAKE2b-256	`0c57f065b5bf525e7af45fc080c1a99c0de8b494dfa00e084ffdeaaab98a54ab`

See more details on using hashes here.

File details

Details for the file candorflow-0.1.1-py3-none-any.whl.

File metadata

Download URL: candorflow-0.1.1-py3-none-any.whl
Upload date: Nov 16, 2025
Size: 15.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for candorflow-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0fe54cc4d4bae459b6fc2cfb7f31b998acb84f892960542741f04f7478515e8f`
MD5	`1cf270c27e1926eac8c4a60ac8dee58b`
BLAKE2b-256	`70e2f6cb908b3249309890022378317edeb352ae4641901ef1ccb9389736762c`

See more details on using hashes here.

candorflow 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

🌊 CandorFlow

⚠️ Important Notice

📖 Overview

What is λ(t)?

🎯 Features in This Demo

✅ What This Repo Contains (Safe/Public Demo)

🚫 What Is NOT Included (Proprietary)

Core Algorithms

Advanced Control

Domain Extensions

Performance

🚀 Installation

Prerequisites

Option 1: Install from GitHub (Recommended)

Option 2: Development Installation

Option 3: Manual Setup (Not Recommended)

💻 Usage

Quick Start: Run the Training Demo

Colab Integration

📁 Repository Structure

🔬 How It Works (Simplified Version)

1. Monitor Training with λ(t)

2. Automatic Intervention

3. Training Continues Safely

📊 Example Results

🧪 Running Tests

📚 Documentation

API Reference

compute_lambda_metric(model, loss, history_window=10, gradient_history=None)

StabilityController(threshold, checkpoint_dir, lr_reduction_factor)

🤝 Contributing

📧 Contact

📄 License

📖 Citation

🙏 Acknowledgments

⭐ Star History

Project Support & Affiliations

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`compute_lambda_metric(model, loss, history_window=10, gradient_history=None)`

`StabilityController(threshold, checkpoint_dir, lr_reduction_factor)`