Skip to main content

RHOAI tool kit for managing and upgrading RHOAI

Project description

RHOShift - OpenShift Operator Installation Toolkit

Python Version OpenShift Compatible Stability Level

A comprehensive, enterprise-grade toolkit for managing OpenShift operators with enhanced stability features, automatic dependency resolution, and Red Hat OpenShift AI (RHOAI) integration.

๐Ÿ“‹ Table of Contents

โœจ Features

๐Ÿš€ Core Functionality

  • 7 Enterprise Operators: Complete operator stack for modern OpenShift deployments
  • Enhanced Stability System: 3-tier stability levels with comprehensive error handling
  • Automatic Dependency Resolution: Smart installation order with dependency detection
  • Pre-flight Validation: Cluster readiness and permission verification
  • Health Monitoring: Real-time operator status tracking and reporting
  • Auto-recovery: Intelligent error classification and automatic retry logic

๐Ÿ›ก๏ธ Enterprise-Grade Reliability

  • Comprehensive Error Handling: 59+ exception handlers throughout codebase
  • Webhook Certificate Resilience: Automatic timing issue resolution for RHOAI
  • Resource Conflict Detection: Prevention of operator namespace conflicts
  • Smart Retry Logic: Exponential backoff with contextual error recovery
  • Parallel Installation: Optimized performance for multiple operators

๐Ÿ”ง Advanced Integration

  • RHOAI DSC/DSCI Management: Complete DataScienceCluster lifecycle control
  • Kueue Management States: Dynamic DSC integration with Managed/Unmanaged modes
  • KedaController Automation: Automatic KEDA controller creation and validation
  • Configurable Timeouts: Flexible timing control for enterprise environments

๐Ÿ›ก๏ธ Enhanced Stability Features

RHOShift includes a comprehensive stability system designed for enterprise deployments:

Stability Levels

  • ๐ŸŸข Enhanced (Default): Pre-flight checks + health monitoring + auto-recovery
  • ๐Ÿ”ต Comprehensive: Maximum resilience with advanced error classification
  • โšช Basic: Standard installation with basic error handling

Pre-flight Validation

  • โœ… Cluster connectivity and authentication
  • โœ… Required permissions verification
  • โœ… Resource quota validation
  • โœ… Operator catalog accessibility
  • โœ… Namespace conflict detection
  • โœ… DSCI compatibility validation for RHOAI installations

Health Monitoring

  • ๐Ÿ“Š Real-time operator status tracking
  • ๐Ÿ” Multi-resource health validation
  • ๐Ÿ“ˆ Installation progress reporting
  • โšก Performance metrics and timing

Auto-recovery Features

  • ๐Ÿ”„ Intelligent retry mechanisms
  • ๐Ÿง  Error classification (transient vs. permanent)
  • โฐ Exponential backoff strategies
  • ๐Ÿ› ๏ธ Automatic resource cleanup and recreation

๐Ÿ“ฆ Supported Operators

Operator Package Namespace Channel Dependencies
OpenShift Serverless serverless-operator openshift-serverless stable None
Service Mesh servicemeshoperator openshift-operators stable None
Authorino authorino-operator openshift-operators stable None
cert-manager openshift-cert-manager-operator cert-manager-operator stable-v1 None
Kueue kueue-operator openshift-kueue-operator stable-v1.0 cert-manager
KEDA openshift-custom-metrics-autoscaler-operator openshift-keda stable None
RHOAI/ODH opendatahub-operator openshift-operators stable None

๐Ÿš€ Installation

Quick Install

git clone https://github.com/mwaykole/O.git
cd O
pip install -e .

Verify Installation

rhoshift --help
rhoshift --summary

๐Ÿ’ป Usage

Basic Commands

# Install single operator with enhanced stability
rhoshift --serverless

# Install multiple operators with batch optimization
rhoshift --serverless --servicemesh --authorino

# Install with dependency resolution (Kueue + cert-manager)
rhoshift --kueue

# Install all operators (includes DSCI validation for RHOAI)
rhoshift --all

# Install all with RHOAI channel preference
rhoshift --all --rhoai-channel=odh-nightlies

# Show detailed operator summary
rhoshift --summary

# Clean up all operators
rhoshift --cleanup

RHOAI with DSC/DSCI

# Install RHOAI with complete setup
rhoshift --rhoai \
  --rhoai-channel=odh-nightlies \
  --rhoai-image=brew.registry.redhat.io/rh-osbs/iib:1049242 \
  --deploy-rhoai-resources

# Install RHOAI with Kueue integration
rhoshift --rhoai --kueue Managed \
  --rhoai-channel=stable \
  --rhoai-image=quay.io/rhoai/rhoai-fbc-fragment:rhoai-2.25-nightly \
  --deploy-rhoai-resources

Kueue Management States

# Install Kueue as Managed (RHOAI controls it)
rhoshift --kueue Managed

# Install Kueue as Unmanaged (independent) - Default
rhoshift --kueue Unmanaged
rhoshift --kueue  # Same as above

# Switch management states (updates existing DSC)
rhoshift --kueue Managed    # Switch to Managed
rhoshift --kueue Unmanaged  # Switch to Unmanaged

๐Ÿ”ง Advanced Usage

Enterprise Deployment

# Complete ML/AI stack with queue management
rhoshift --all --kueue Managed \
  --rhoai-channel=stable \
  --rhoai-image=brew.registry.redhat.io/rh-osbs/iib:1049242 \
  --deploy-rhoai-resources \
  --timeout=900

# High-availability setup with service mesh
rhoshift --serverless --servicemesh --keda --authorino

# Development environment setup
rhoshift --cert-manager --kueue Unmanaged --keda

Custom Configuration

# Custom timeouts and retries for enterprise clusters
rhoshift --all \
  --timeout=1200 \
  --retries=5 \
  --retry-delay=15

# Custom oc binary path
rhoshift --serverless --oc-binary=/usr/local/bin/oc

# Verbose output for debugging
rhoshift --kueue Managed --verbose

๐Ÿ”— Dependency Management

RHOShift automatically handles operator dependencies:

Automatic Resolution

  • Kueue โ†’ cert-manager: Installing Kueue automatically includes cert-manager
  • Installation Order: Dependencies installed first, primary operators second
  • Conflict Detection: Prevents namespace and resource conflicts

Smart Validation

# This command installs BOTH cert-manager AND Kueue in correct order:
rhoshift --kueue
# Output:
# ๐Ÿ” Pre-flight checks passed. Cluster is ready for installation.
# โš ๏ธ  Missing dependency: kueue-operator requires openshift-cert-manager-operator
# ๐Ÿš€ Installing 2 operators with enhanced stability...
# โœ… cert-manager installed successfully
# โœ… kueue installed successfully

๐Ÿค– RHOAI Integration

DataScienceCluster Management

RHOShift provides complete DSC/DSCI lifecycle management:

# Create RHOAI with DSC/DSCI
rhoshift --rhoai --deploy-rhoai-resources

# RHOAI with Kueue integration
rhoshift --rhoai --kueue Managed --deploy-rhoai-resources

DSC Behavior

  • Existing DSC: Automatically updates Kueue managementState
  • No DSC: State applied when DSC is created via --deploy-rhoai-resources
  • Webhook Resilience: Automatic handling of certificate timing issues

Output Examples

# When DSC exists and gets updated:
๐Ÿ”„ Updating DSC with Kueue managementState: Managed
โœ… Successfully updated DSC with Kueue managementState: Managed

# When no DSC exists:
โ„น๏ธ  No existing DSC found. Kueue managementState will be applied when DSC is created.

โš™๏ธ Configuration

CLI Options

Operator Selection:
  --serverless          Install OpenShift Serverless Operator
  --servicemesh         Install Service Mesh Operator
  --authorino           Install Authorino Operator
  --cert-manager        Install cert-manager Operator
  --rhoai               Install RHOAI Operator
  --kueue [{Managed,Unmanaged}]  Install Kueue with DSC integration
  --keda                Install KEDA (Custom Metrics Autoscaler)
  --all                 Install all operators
  --cleanup             Clean up all operators
  --summary             Show operator summary

Configuration:
  --oc-binary OC_BINARY     Path to oc CLI (default: oc)
  --retries RETRIES         Max retry attempts (default: 3)
  --retry-delay RETRY_DELAY Delay between retries (default: 10s)
  --timeout TIMEOUT         Command timeout (default: 300s)

RHOAI Options:
  --rhoai-channel CHANNEL   RHOAI channel (stable/odh-nightlies)
  --rhoai-image IMAGE       RHOAI container image
  --raw RAW                 Enable raw serving (True/False)
  --deploy-rhoai-resources  Create DSC and DSCI

Environment Variables

export LOG_FILE_LEVEL=DEBUG      # File logging level
export LOG_CONSOLE_LEVEL=INFO    # Console logging level

Logging

  • Location: /tmp/rhoshift.log
  • Rotation: 10MB max size, 5 backup files
  • Levels: DEBUG (file) / INFO (console)
  • Colors: Supported in compatible terminals

๐Ÿ” Troubleshooting

Common Issues

Permission Errors

# Verify cluster access
oc whoami
oc auth can-i create subscriptions -n openshift-operators

Installation Failures

# Check logs
tail -f /tmp/rhoshift.log

# Verify operator catalogs
oc get catalogsource -n openshift-marketplace

# Check with enhanced timeouts
rhoshift --kueue --timeout=900 --retries=5

Dependency Issues

# Verify dependencies are resolved
rhoshift --summary

# Manual dependency installation
rhoshift --cert-manager
rhoshift --kueue

RHOAI/DSC Issues

# Check DSC status
oc get dsc,dsci -A

# Verify webhook certificates
oc get pods -n opendatahub-operators

# Manual DSC creation
rhoshift --rhoai --deploy-rhoai-resources --timeout=900

DSCI Immutable Field Conflicts

# Error: MonitoringNamespace is immutable
# This happens when existing DSCI has different monitoring namespace

# Check existing DSCI configuration
oc get dsci default-dsci -o yaml

# Solution 1: Force recreate DSCI (recommended)
rhoshift --rhoai --deploy-rhoai-resources

# Solution 2: Use existing DSCI configuration
# RHOShift will automatically detect and adapt to existing DSCI

Debug Mode

# Enable verbose output
rhoshift --all --verbose

# Check stability report
rhoshift --summary

๐Ÿ› ๏ธ Development

Prerequisites

  • Python 3.8+
  • OpenShift CLI (oc)
  • OpenShift cluster access
  • cluster-admin privileges

Project Structure

rhoshift/
โ”œโ”€โ”€ rhoshift/
โ”‚   โ”œโ”€โ”€ cli/              # Command-line interface
โ”‚   โ”œโ”€โ”€ logger/           # Logging system
โ”‚   โ”œโ”€โ”€ utils/
โ”‚   โ”‚   โ”œโ”€โ”€ operator/     # Operator management
โ”‚   โ”‚   โ”œโ”€โ”€ resilience.py # Error handling & recovery
โ”‚   โ”‚   โ”œโ”€โ”€ health_monitor.py # Health monitoring
โ”‚   โ”‚   โ”œโ”€โ”€ stability_coordinator.py # Stability management
โ”‚   โ”‚   โ””โ”€โ”€ constants.py  # Operator configurations
โ”‚   โ””โ”€โ”€ main.py          # Entry point
โ”œโ”€โ”€ scripts/
โ”‚   โ”œโ”€โ”€ cleanup/         # Cleanup utilities
โ”‚   โ””โ”€โ”€ run_upgrade_matrix.sh # Upgrade testing
โ””โ”€โ”€ tests/               # Test suite

Running Tests

pytest tests/

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Commit changes: git commit -am 'Add feature'
  4. Push to branch: git push origin feature-name
  5. Create Pull Request

Development Guidelines

  • Follow Python PEP 8 standards
  • Add tests for new features
  • Update documentation
  • Ensure backward compatibility

๐Ÿ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

๐Ÿ†˜ Support

  • Issues: GitHub Issues
  • Documentation: This README and --help output
  • Logs: /tmp/rhoshift.log for detailed debugging

RHOShift - Enterprise-grade OpenShift operator management with enhanced stability and reliability features.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rhoshift-0.1.7.4.tar.gz (115.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rhoshift-0.1.7.4-py3-none-any.whl (70.2 kB view details)

Uploaded Python 3

File details

Details for the file rhoshift-0.1.7.4.tar.gz.

File metadata

  • Download URL: rhoshift-0.1.7.4.tar.gz
  • Upload date:
  • Size: 115.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rhoshift-0.1.7.4.tar.gz
Algorithm Hash digest
SHA256 478371b09a008f0fa3aeea65791e994d0ca2a90d9b674939c9b9b280aba36c62
MD5 54600f40c04c3596a4ee9e9cbd004801
BLAKE2b-256 51c736492a7635c833cad1ea0c32a7351409785700dc101a5116819ad1d84870

See more details on using hashes here.

File details

Details for the file rhoshift-0.1.7.4-py3-none-any.whl.

File metadata

  • Download URL: rhoshift-0.1.7.4-py3-none-any.whl
  • Upload date:
  • Size: 70.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rhoshift-0.1.7.4-py3-none-any.whl
Algorithm Hash digest
SHA256 d3aacecde26c730375886e7a4dfac133200e77188661d1b8976f7fcec917f979
MD5 384e7574347289f1234d10432d54dc2a
BLAKE2b-256 4dfcc18e276f8e774aeb24eaf8e828363509c4f3601828b2ab27fd64d555b673

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page