Skip to main content

RHOAI tool kit for managing and upgrading RHOAI

Project description

RHOAI Tool Kit

Python Version OpenShift Compatible

A comprehensive toolkit for managing and upgrading Red Hat OpenShift AI (RHOAI) installations with parallel installation support.

๐Ÿ“‹ Table of Contents

โœจ Features

  • Install single or multiple OpenShift operators
  • Parallel installation for faster deployments
  • Configurable timeouts and retries
  • Comprehensive logging system
  • Supports:
    • Serverless Operator
    • Service Mesh Operator
    • Authorino Operator
    • cert-manager Operator (Kueue dependency)
    • RHOAI Operator
    • Kueue Operator with DSC Integration
    • KEDA (Custom Metrics Autoscaler) Operator
  • Automatic Dependency Resolution: Installs required operators in correct order
  • Smart Validation: Pre-installation compatibility and conflict detection
  • ๐Ÿ†• Kueue DSC Integration: Automatically updates RHOAI DataScienceCluster with Kueue management state

๐Ÿ“ Project Structure

rhoshift/
โ”œโ”€โ”€ rhoshift/              # Main package directory
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ main.py           # CLI entry point
โ”‚   โ”œโ”€โ”€ cli/              # Command-line interface
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ args.py      # Argument parsing
โ”‚   โ”‚   โ””โ”€โ”€ commands.py  # Command implementations
โ”‚   โ”œโ”€โ”€ logger/          # Logging utilities
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ””โ”€โ”€ logger.py    # Logging configuration
โ”‚   โ””โ”€โ”€ utils/           # Core utilities
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ constants.py # Constants and configurations
โ”‚       โ”œโ”€โ”€ operator.py  # Operator management
โ”‚       โ””โ”€โ”€ utils.py     # Utility functions
โ”œโ”€โ”€ run_upgrade_matrix.sh  # Upgrade matrix execution script
โ”œโ”€โ”€ upgrade_matrix_usage.md # Upgrade matrix documentation
โ”œโ”€โ”€ pyproject.toml        # Project dependencies and configuration
โ””โ”€โ”€ README.md            # This document

๐Ÿ“‹ Components

Core Components

  • CLI: Command-line interface for operator management
  • Logger: Logging configuration and utilities (logs to /tmp/rhoshift.log)
  • Utils: Core utilities and operator management logic

RHOAI Components

  • RHOAI Upgrade Matrix: Utilities for testing RHOAI upgrades
  • Upgrade Matrix Scripts: Execution and documentation for upgrade testing

Maintenance Scripts

  • Cleanup Scripts: Utilities for cleaning up operator installations
  • Worker Node Scripts: Utilities for managing worker node configurations

๐Ÿš€ Installation

  1. Clone the repository:
git clone https://github.com/mwaykole/O.git
cd O
  1. Install dependencies:
pip install -e .
  1. Verify installation:
rhoshift --help

๐Ÿ”ง New CLI Options

rhoshift --help
usage: rhoshift [-h] [--serverless] [--servicemesh] [--authorino] [--cert-manager]
                [--rhoai] [--kueue [{Managed,Unmanaged}]] [--keda] [--all] [--cleanup]
                [--deploy-rhoai-resources] [--summary] [--oc-binary OC_BINARY]
                [--retries RETRIES] [--retry-delay RETRY_DELAY] [--timeout TIMEOUT]
                [--rhoai-channel RHOAI_CHANNEL] [--raw RAW] [--rhoai-image RHOAI_IMAGE]

Operator Selection:
  --serverless          Install OpenShift Serverless Operator
  --servicemesh         Install Service Mesh Operator
  --authorino           Install Authorino Operator
  --cert-manager        Install cert-manager Operator (latest v1.16.1)
  --rhoai               Install RHOAI Operator
  --kueue [{Managed,Unmanaged}]  Install Kueue Operator with DSC managementState (default: Unmanaged)
  --keda                Install KEDA (Custom Metrics Autoscaler) Operator
  --all                 Install all operators
  --cleanup             Clean up all RHOAI, serverless, servicemesh, Authorino operators
  --deploy-rhoai-resources      Create DSC and DSCI with RHOAI installation
  --summary             Show detailed summary of all supported operators and versions

๐Ÿ’ป Usage

Basic Commands

# Install single operator
rhoshift --serverless

# Install multiple operators
rhoshift --serverless --servicemesh

# Install cert-manager operator
rhoshift --cert-manager

# Install Kueue operator with default managementState (Unmanaged)
# Automatically installs cert-manager dependency
rhoshift --kueue

# Install Kueue operator with specific managementState in DSC
rhoshift --kueue Managed     # Sets Kueue as Managed in DSC
rhoshift --kueue Unmanaged   # Sets Kueue as Unmanaged in DSC

# Install KEDA (Custom Metrics Autoscaler) operator
rhoshift --keda

# Install RHOAI with raw configuration
rhoshift --rhoai --rhoai-channel=<channel> --rhoai-image=<image> --raw=True

# Install RHOAI with Serverless configuration
rhoshift --rhoai --rhoai-channel=<channel> --rhoai-image=<image> --raw=False --all

# Install all operators (Kueue will be set to Unmanaged in DSC)
rhoshift --all

# Create DSC and DSCI with RHOAI operator installation
rhoshift --rhoai --deploy-rhoai-resources

# Clean up all operators
rhoshift --cleanup

๐Ÿ”— Operator Dependencies & Validation

The tool automatically handles operator dependencies and provides smart validation:

Automatic Dependency Resolution

  • Kueue requires cert-manager: Installing Kueue automatically includes cert-manager
  • Dependencies are installed in the correct order to prevent failures
  • Missing dependencies are automatically detected and added
# This command will install BOTH cert-manager AND Kueue (in correct order)
# Kueue will be set to Unmanaged in DSC (if DSC exists)
rhoshift --kueue

# You'll see output like:
# ๐Ÿ“ฆ Auto-adding dependency: cert-manager
# Installing 2 operators in order: cert-manager โ†’ kueue
# ๐Ÿ”„ Updating DSC with Kueue managementState: Unmanaged
# โœ… Successfully updated DSC with Kueue managementState: Unmanaged

Smart Validation

  • Compatibility Checking: Warns about potential operator conflicts
  • Namespace Validation: Detects if operators conflict in shared namespaces
  • Pre-Installation Validation: Catches issues before installation starts
# Example validation warnings:
# โš ๏ธ  Note: Kueue and KEDA may have resource conflicts. Monitor for admission webhook issues.
# โš ๏ธ  Installation order will be adjusted for dependencies: cert-manager โ†’ kueue

Supported Dependencies

Primary Operator Required Dependencies
Kueue cert-manager

Note: When installing Kueue individually (--kueue), you will see dependency warnings. For automatic dependency installation, use batch mode (--cert-manager --kueue) or install dependencies manually first.

๐ŸŽฏ Kueue DSC Integration

New Feature: Kueue operator installation now automatically updates the RHOAI DataScienceCluster (DSC) when a management state is specified.

Kueue Management States

  • Managed: RHOAI controls Kueue configuration and lifecycle
  • Unmanaged: Kueue runs independently, not managed by RHOAI

Usage Examples

# Install Kueue as Managed (RHOAI controls it)
rhoshift --kueue Managed

# Install Kueue as Unmanaged (independent operation) - DEFAULT
rhoshift --kueue Unmanaged
rhoshift --kueue  # Same as above

# Switch between states (updates existing DSC)
rhoshift --kueue Managed    # Change to Managed
rhoshift --kueue Unmanaged  # Change back to Unmanaged

Behavior

  • DSC Exists: Automatically updates Kueue managementState in existing DSC
  • No DSC: Shows info message that state will be applied when DSC is created
  • Error Handling: Graceful warnings if DSC update fails

Output Examples

# When DSC exists and gets updated:
๐Ÿ”„ Updating DSC with Kueue managementState: Unmanaged
โœ… Successfully updated DSC with Kueue managementState: Unmanaged

# When no DSC exists:
โ„น๏ธ  No existing DSC found. Kueue managementState will be applied when DSC is created.

Advanced Options

# Custom oc binary path
rhoshift --serverless --oc-binary /path/to/oc

# Custom timeout (seconds)
rhoshift --all --timeout 900

# Install queue management and auto-scaling operators together
# (cert-manager will be automatically installed as Kueue dependency)
rhoshift --kueue Managed --keda

# Install complete ML/AI stack with queue management
rhoshift --rhoai --kueue Managed --keda --rhoai-channel=stable --rhoai-image=quay.io/rhoai/rhoai-fbc-fragment:rhoai-2.25-nightly

# Show summary of all supported operators and their versions
rhoshift --summary

# Install only cert-manager for other uses
rhoshift --cert-manager

# Verbose output
rhoshift --all --verbose

Upgrade Matrix Testing

To run the upgrade matrix tests, you can use either method:

  1. Using the shell script:
./run_upgrade_matrix.sh [options] <current_version> <current_channel> <new_version> <new_channel>
  1. Using the Python command:
run-upgrade-matrix [options] <current_version> <current_channel> <new_version> <new_channel>

Options:

  • -s, --scenario: Run specific scenario(s) (serverless, rawdeployment, serverless,rawdeployment)
  • --skip-cleanup: Skip cleanup before each scenario
  • --from-image: Custom source image path
  • --to-image: Custom target image path

Example:

# Using shell script
./run_upgrade_matrix.sh -s serverless -s rawdeployment 2.10 stable 2.12 stable

# Using Python command
run-upgrade-matrix -s serverless -s rawdeployment 2.10 stable 2.12 stable

๐Ÿ“ Logging

The toolkit uses a comprehensive logging system:

  • Logs are stored in /tmp/rhoshift.log
  • Console output shows INFO level and above
  • File logging captures DEBUG level and above
  • Automatic log rotation (10MB max size, 5 backup files)
  • Colored output in supported terminals

To view logs:

tail -f /tmp/rhoshift.log

๐Ÿ”ง Configuration

Environment Variables

  • LOG_FILE_LEVEL: Set file logging level (default: DEBUG)
  • LOG_CONSOLE_LEVEL: Set console logging level (default: INFO)

Command Options

  • --oc-binary: Path to oc CLI (default: oc)
  • --retries: Max retry attempts (default: 3)
  • --retry-delay: Delay between retries in seconds (default: 10)
  • --timeout: Command timeout in seconds (default: 300)

๐Ÿ› ๏ธ Development

Prerequisites

  • Python 3.8 or higher
  • OpenShift CLI (oc)
  • Access to an OpenShift cluster

Running Tests

pytest tests/

๐Ÿ” Troubleshooting

Common Issues

  1. Operator Installation Fails

    • Check cluster access: oc whoami
    • Verify operator catalog: oc get catalogsource
    • Check logs: tail -f /tmp/rhoshift.log
  2. Permission Issues

    • Ensure you have cluster-admin privileges
    • Check namespace permissions
  3. Timeout Errors

    • Increase timeout: --timeout 900
    • Check cluster resources

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create a Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rhoshift-0.1.6.tar.gz (52.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rhoshift-0.1.6-py3-none-any.whl (49.3 kB view details)

Uploaded Python 3

File details

Details for the file rhoshift-0.1.6.tar.gz.

File metadata

  • Download URL: rhoshift-0.1.6.tar.gz
  • Upload date:
  • Size: 52.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rhoshift-0.1.6.tar.gz
Algorithm Hash digest
SHA256 60a1a40b408596a33a8f1e7b698edbf34d6e6cf7ef6322af7734c7b4f0ec07a1
MD5 4a1a13079b90c67378a025125083f98b
BLAKE2b-256 37482793abc7908f0e182cfb4a6b4328f1b757dd8c49d41ea35798940c2b3738

See more details on using hashes here.

File details

Details for the file rhoshift-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: rhoshift-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 49.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rhoshift-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 6ccc101885d7e79c62fd84fc713e14514264a4654bf992421cd8f0df5dd26696
MD5 115a3a798c8dfb66184a5ea75c5f68e3
BLAKE2b-256 bb826ff6962dd6db29c33d23911e5356e6b4e614b6ccbb3b715ffb0636f41f28

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page