Skip to main content

RHOAI tool kit for managing and upgrading RHOAI

Project description

RHOShift - OpenShift Operator Installation Toolkit

Python Version OpenShift Compatible Stability Level

A comprehensive, enterprise-grade toolkit for managing OpenShift operators with enhanced stability features, automatic dependency resolution, and Red Hat OpenShift AI (RHOAI) integration.

๐Ÿ“‹ Table of Contents

โœจ Features

๐Ÿš€ Core Functionality

  • 6 Enterprise Operators: Complete operator stack for modern OpenShift deployments
  • Enhanced Stability System: 3-tier stability levels with comprehensive error handling
  • Automatic Dependency Resolution: Smart installation order with dependency detection
  • Pre-flight Validation: Cluster readiness and permission verification
  • Health Monitoring: Real-time operator status tracking and reporting
  • Auto-recovery: Intelligent error classification and automatic retry logic

๐Ÿ›ก๏ธ Enterprise-Grade Reliability

  • Comprehensive Error Handling: 59+ exception handlers throughout codebase
  • Webhook Certificate Resilience: Automatic timing issue resolution for RHOAI
  • Resource Conflict Detection: Prevention of operator namespace conflicts
  • Smart Retry Logic: Exponential backoff with contextual error recovery
  • Parallel Installation: Optimized performance for multiple operators

๐Ÿ”ง Advanced Integration

  • RHOAI DSC/DSCI Management: Complete DataScienceCluster lifecycle control
  • Kueue Management States: Dynamic DSC integration with Managed/Unmanaged modes
  • KedaController Automation: Automatic KEDA controller creation and validation
  • Kuadrant Automation: Automatic Kuadrant CR creation for RHCL
  • LeaderWorkerSet Automation: Automatic LeaderWorkerSetOperator CR creation for LWS
  • Configurable Timeouts: Flexible timing control for enterprise environments

๐Ÿ›ก๏ธ Enhanced Stability Features

RHOShift includes a comprehensive stability system designed for enterprise deployments:

Stability Levels

  • ๐ŸŸข Enhanced (Default): Pre-flight checks + health monitoring + auto-recovery
  • ๐Ÿ”ต Comprehensive: Maximum resilience with advanced error classification
  • โšช Basic: Standard installation with basic error handling

Pre-flight Validation

  • โœ… Cluster connectivity and authentication
  • โœ… Required permissions verification
  • โœ… Resource quota validation
  • โœ… Operator catalog accessibility
  • โœ… Namespace conflict detection
  • โœ… DSCI compatibility validation for RHOAI installations

Health Monitoring

  • ๐Ÿ“Š Real-time operator status tracking
  • ๐Ÿ” Multi-resource health validation
  • ๐Ÿ“ˆ Installation progress reporting
  • โšก Performance metrics and timing

Auto-recovery Features

  • ๐Ÿ”„ Intelligent retry mechanisms
  • ๐Ÿง  Error classification (transient vs. permanent)
  • โฐ Exponential backoff strategies
  • ๐Ÿ› ๏ธ Automatic resource cleanup and recreation

๐Ÿ“ฆ Supported Operators

Operator Package Namespace Channel Dependencies
cert-manager openshift-cert-manager-operator cert-manager-operator stable-v1 None
Kueue kueue-operator openshift-kueue-operator stable-v1.0 cert-manager
KEDA openshift-custom-metrics-autoscaler-operator openshift-keda stable None
RHCL rhcl-operator openshift-operators stable None
LWS leader-worker-set openshift-lws-operator stable-v1.0 None
RHOAI/ODH opendatahub-operator openshift-operators stable None

๐Ÿš€ Installation

Quick Install

git clone https://github.com/mwaykole/O.git
cd O
pip install -e .

Verify Installation

rhoshift --help
rhoshift --summary

๐Ÿ’ป Usage

Basic Commands

# Install single operator with enhanced stability
rhoshift --cert-manager

# Install multiple operators with batch optimization
rhoshift --cert-manager --keda --kueue --rhcl --lws

# Install with dependency resolution (Kueue + cert-manager)
rhoshift --kueue

# Install all operators (includes DSCI validation for RHOAI)
rhoshift --all

# Install all with RHOAI channel preference
rhoshift --all --rhoai-channel=odh-nightlies

# Show detailed operator summary
rhoshift --summary

# Clean up all operators
rhoshift --cleanup

RHOAI with DSC/DSCI

# Install RHOAI with complete setup
rhoshift --rhoai \
  --rhoai-channel=odh-nightlies \
  --rhoai-image=brew.registry.redhat.io/rh-osbs/iib:1049242 \
  --deploy-rhoai-resources

# Install RHOAI with Kueue integration
rhoshift --rhoai --kueue Managed \
  --rhoai-channel=stable \
  --rhoai-image=quay.io/rhoai/rhoai-fbc-fragment:rhoai-2.25-nightly \
  --deploy-rhoai-resources

Kueue Management States

# Install Kueue as Managed (RHOAI controls it)
rhoshift --kueue Managed

# Install Kueue as Unmanaged (independent) - Default
rhoshift --kueue Unmanaged
rhoshift --kueue  # Same as above

# Switch management states (updates existing DSC)
rhoshift --kueue Managed    # Switch to Managed
rhoshift --kueue Unmanaged  # Switch to Unmanaged

๐Ÿ”ง Advanced Usage

Enterprise Deployment

# Complete ML/AI stack with queue management
rhoshift --all --kueue Managed \
  --rhoai-channel=stable \
  --rhoai-image=brew.registry.redhat.io/rh-osbs/iib:1049242 \
  --deploy-rhoai-resources \
  --timeout=900

# Development environment setup
rhoshift --cert-manager --kueue Unmanaged --keda --rhcl --lws

Custom Configuration

# Custom timeouts and retries for enterprise clusters
rhoshift --all \
  --timeout=1200 \
  --retries=5 \
  --retry-delay=15

# Custom oc binary path
rhoshift --cert-manager --oc-binary=/usr/local/bin/oc

# Verbose output for debugging
rhoshift --kueue Managed --verbose

๐Ÿ”— Dependency Management

RHOShift automatically handles operator dependencies:

Automatic Resolution

  • Kueue โ†’ cert-manager: Installing Kueue automatically includes cert-manager
  • Installation Order: Dependencies installed first, primary operators second
  • Conflict Detection: Prevents namespace and resource conflicts

Smart Validation

# This command installs BOTH cert-manager AND Kueue in correct order:
rhoshift --kueue
# Output:
# ๐Ÿ” Pre-flight checks passed. Cluster is ready for installation.
# โš ๏ธ  Missing dependency: kueue-operator requires openshift-cert-manager-operator
# ๐Ÿš€ Installing 2 operators with enhanced stability...
# โœ… cert-manager installed successfully
# โœ… kueue installed successfully

๐Ÿค– RHOAI Integration

DataScienceCluster Management

RHOShift provides complete DSC/DSCI lifecycle management:

# Create RHOAI with DSC/DSCI
rhoshift --rhoai --deploy-rhoai-resources

# RHOAI with Kueue integration
rhoshift --rhoai --kueue Managed --deploy-rhoai-resources

DSC Behavior

  • Existing DSC: Automatically updates Kueue managementState
  • No DSC: State applied when DSC is created via --deploy-rhoai-resources
  • Webhook Resilience: Automatic handling of certificate timing issues

Output Examples

# When DSC exists and gets updated:
๐Ÿ”„ Updating DSC with Kueue managementState: Managed
โœ… Successfully updated DSC with Kueue managementState: Managed

# When no DSC exists:
โ„น๏ธ  No existing DSC found. Kueue managementState will be applied when DSC is created.

โš™๏ธ Configuration

CLI Options

Operator Selection:
  --cert-manager        Install cert-manager Operator
  --rhoai               Install RHOAI Operator
  --kueue [{Managed,Unmanaged}]  Install Kueue with DSC integration
  --keda                Install KEDA (Custom Metrics Autoscaler)
  --rhcl                Install RHCL (Red Hat Connectivity Link) and create Kuadrant CR
  --lws                 Install LWS (Leader Worker Set) and create LeaderWorkerSetOperator CR
  --all                 Install all operators
  --cleanup             Clean up all operators
  --summary             Show operator summary

Configuration:
  --oc-binary OC_BINARY     Path to oc CLI (default: oc)
  --retries RETRIES         Max retry attempts (default: 3)
  --retry-delay RETRY_DELAY Delay between retries (default: 10s)
  --timeout TIMEOUT         Command timeout (default: 300s)

RHOAI Options:
  --rhoai-channel CHANNEL   RHOAI channel (stable/odh-nightlies)
  --rhoai-image IMAGE       RHOAI container image
  --raw RAW                 Enable raw serving (True/False)
  --deploy-rhoai-resources  Create DSC and DSCI

Environment Variables

export LOG_FILE_LEVEL=DEBUG      # File logging level
export LOG_CONSOLE_LEVEL=INFO    # Console logging level

Logging

  • Location: /tmp/rhoshift.log
  • Rotation: 10MB max size, 5 backup files
  • Levels: DEBUG (file) / INFO (console)
  • Colors: Supported in compatible terminals

๐Ÿ” Troubleshooting

Common Issues

Permission Errors

# Verify cluster access
oc whoami
oc auth can-i create subscriptions -n openshift-operators

Installation Failures

# Check logs
tail -f /tmp/rhoshift.log

# Verify operator catalogs
oc get catalogsource -n openshift-marketplace

# Check with enhanced timeouts
rhoshift --kueue --timeout=900 --retries=5

Dependency Issues

# Verify dependencies are resolved
rhoshift --summary

# Manual dependency installation
rhoshift --cert-manager
rhoshift --kueue

RHOAI/DSC Issues

# Check DSC status
oc get dsc,dsci -A

# Verify webhook certificates
oc get pods -n opendatahub-operators

# Manual DSC creation
rhoshift --rhoai --deploy-rhoai-resources --timeout=900

DSCI Immutable Field Conflicts

# Error: MonitoringNamespace is immutable
# This happens when existing DSCI has different monitoring namespace

# Check existing DSCI configuration
oc get dsci default-dsci -o yaml

# Solution 1: Force recreate DSCI (recommended)
rhoshift --rhoai --deploy-rhoai-resources

# Solution 2: Use existing DSCI configuration
# RHOShift will automatically detect and adapt to existing DSCI

Debug Mode

# Enable verbose output
rhoshift --all --verbose

# Check stability report
rhoshift --summary

๐Ÿ› ๏ธ Development

Prerequisites

  • Python 3.8+
  • OpenShift CLI (oc)
  • OpenShift cluster access
  • cluster-admin privileges

Project Structure

rhoshift/
โ”œโ”€โ”€ rhoshift/
โ”‚   โ”œโ”€โ”€ cli/              # Command-line interface
โ”‚   โ”œโ”€โ”€ logger/           # Logging system
โ”‚   โ”œโ”€โ”€ utils/
โ”‚   โ”‚   โ”œโ”€โ”€ operator/     # Operator management
โ”‚   โ”‚   โ”œโ”€โ”€ resilience.py # Error handling & recovery
โ”‚   โ”‚   โ”œโ”€โ”€ health_monitor.py # Health monitoring
โ”‚   โ”‚   โ”œโ”€โ”€ stability_coordinator.py # Stability management
โ”‚   โ”‚   โ””โ”€โ”€ constants.py  # Operator configurations
โ”‚   โ””โ”€โ”€ main.py          # Entry point
โ”œโ”€โ”€ scripts/
โ”‚   โ”œโ”€โ”€ cleanup/         # Cleanup utilities
โ”‚   โ””โ”€โ”€ run_upgrade_matrix.sh # Upgrade testing
โ””โ”€โ”€ tests/               # Test suite

Running Tests

pytest tests/

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Commit changes: git commit -am 'Add feature'
  4. Push to branch: git push origin feature-name
  5. Create Pull Request

Development Guidelines

  • Follow Python PEP 8 standards
  • Add tests for new features
  • Update documentation
  • Ensure backward compatibility

๐Ÿ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

๐Ÿ†˜ Support

  • Issues: GitHub Issues
  • Documentation: This README and --help output
  • Logs: /tmp/rhoshift.log for detailed debugging

RHOShift - Enterprise-grade OpenShift operator management with enhanced stability and reliability features.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rhoshift-0.1.7.5.tar.gz (118.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rhoshift-0.1.7.5-py3-none-any.whl (72.4 kB view details)

Uploaded Python 3

File details

Details for the file rhoshift-0.1.7.5.tar.gz.

File metadata

  • Download URL: rhoshift-0.1.7.5.tar.gz
  • Upload date:
  • Size: 118.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for rhoshift-0.1.7.5.tar.gz
Algorithm Hash digest
SHA256 aae2341b00ce768d72ef094d9d2e40f6edbc2f694dac0f0c1c87f08163dccd6a
MD5 943cc71fdb6202d1ef1ed69210a5e8e5
BLAKE2b-256 11f79a7045c3ec02a2c844d5fd1381803bdde9297d25db21bf038ba15766a511

See more details on using hashes here.

File details

Details for the file rhoshift-0.1.7.5-py3-none-any.whl.

File metadata

  • Download URL: rhoshift-0.1.7.5-py3-none-any.whl
  • Upload date:
  • Size: 72.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for rhoshift-0.1.7.5-py3-none-any.whl
Algorithm Hash digest
SHA256 73c4851f01db46bcf571c111f694f7a28c05a63d3cd61b4db97a6d7210a7c3e0
MD5 c1d46c4b1f73a95c7a8b94b5e23001b7
BLAKE2b-256 775f2e948cb0635c906cf48dd174982e4b0167f9097b6088c68b2fbc1d410b3e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page