Skip to main content

AI-powered CLI tool for automatic troubleshooting of complex systems based on Kubernetes and public clouds

Project description

RofehCloud

Welcome to the RofehCloud project! This README will guide you through the setup and usage of the project.

Table of Contents

Introduction

RofehCloud software is designed to help you with troubleshooting of simple and complex issues with Kubernetes and public clouds like AWS, Google Cloud (GCP), and Azure. RofehCloud can also use information from local clones of Git-compatible source code repositories.

Ideal users of RofehCloud are:

  • Cloud/support engineers working for an MSP supporting many customers with widely different cloud/K8s environments
  • Production DevOps/SRE engineers interested in shorten their SaaS troubleshooting times
  • Software development engineers working with development/staging environments and interested be more effective in operating and troubleshooting the environments

The name RofehCloud is a play on Hebrew word "rofeh" that means "doctor".

Features

  • RofehCloud runs locally on your computer and for troubleshooting uses your already configured cloud access credentials for CLI tools like "aws", "gcloud", "az" and "kubectl"
  • The text-based tool provides an easy-to-use chat interface
  • Supports OpenAI, Azure OpenAI and AWS Bedrock Anthropic Claude LLMs
  • Protects from incidental change of data locally or in connected clouds

RofehCloud does not require any additional components to be installed in the troubleshooted public cloud or K8s environments.

Some examples of queries/issues RofehCloud can handle:

  • “Connection timeout” error when trying to communicate from EC2 instance i-0cdac72c5578a17db (located in ap-northeast-1 region) to RDS instance test2-warehouse located in us-east-2 region.
  • On EC2 instance i-0991c22729522ca46 (located in ap-northeast-1 region) I’m getting an error “Access denied” when trying to read objects from S3 bucket “exchange-updates”.
  • Investigate unhealthy targets in AWS load balancer capture-test2-backend-lb located in us-east-2 region.
  • In what AWS regions do we have running EC2 instances?
  • Do we have any unused EBS volumes?
  • Why I cannot create S3 bucket named my_new_unique_s3_bucket_xcq?
  • How to modify AWS policy xyz to allow write access to S3 bucket mybucket?
  • Do we have any public S3 buckets?
  • Why do we have two pending k8s pods? How to fix them?

Examples of troubleshooting sessions:

troubleshooting example 1

troubleshooting example 2

Installation

Prerequisites

To successfully deploy and use RofehCloud you will need the following tools and resources:

Software:

  • Python 3.10 or newer
  • git
  • aws (needed if you use AWS)
  • gcloud (needed if you use Google Cloud)
  • az (needed if you use Azure)
  • kubectl and helm (needed if you use Kubernetes)
  • ncli (needed if you use Nutanix)
  • esxcli (needed if you use VMware ESXi)

RofehCloud can work with one of the following LLM services:

  • OpenAI Enterprise API (default LLM):
    • Configure the API key in environment variable OPENAI_API_KEY
    • Recommended (default) OpenAI models are gpt-4o and gpt-4o-mini
  • Azure OpenAI service:
    • Set environment variable LLM_TO_USE to "azure-openai"
    • Configure the OpenAI API key in variable AZURE_OPENAI_API_KEY, Deployment ID in variable AZURE_OPENAI_DEPLOYMENT_ID, and deployment endpoint URL in variable AZURE_OPENAI_ENDPOINT. For example:
    LLM_TO_USE=azure-openai
    AZURE_OPENAI_API_KEY=xxxxxxxxx
    AZURE_OPENAI_ENDPOINT=https://myazureapiendpoint.openai.azure.com
    AZURE_OPENAI_DEPLOYMENT_ID=gpt-4o
    
  • Anthropic Claude models running on AWS Bedrock service:
    • Set environment variable LLM_TO_USE to "bedrock"
    • If the AWS Bedrock service is accessible using a non-default AWS profile, then set the profile name in environment variable BEDROCK_PROFILE_NAME and AWS Bedrock region code name (like "us-west-2") in variable BEDROCK_AWS_REGION
    • By default RofehCloud uses Anthropic models Claude 3.5 Sonnet and Claude 3 Haiku

To get started with RofehCloud, follow these steps (macOs environment):

  1. Clone the repository:
    git clone https://github.com/rofehcloud/rofehcloud.git
    
  2. Navigate to the project directory:
    cd rofehcloud
    
  3. Create Python venv, activate it and install necessary Python dependencies:
    python3 -m venv .venv
    source .venv/bin/activate
    pip install -r requirements.txt
    
  4. Review config.py for supported environment variables and their default values. Use the local .env local file to set any custom values.

An example of .env file format:

OPENAI_API_KEY=sk-xxx-xxxxxxxxxxxxxxxxxxxxxxxxxxx
BEDROCK_PROFILE_NAME=bedrock_profile
BEDROCK_AWS_REGION=us-east-2
LLM_TO_USE=openai
  1. Start RofehCloud for the first time so it will automatically create its data directory (~/.rofehcloud) and default profile configuration file (~/.rofehcloud/profiles/default.yaml):
make run
  1. Optional step: exit the RofehCloud CLI and configure RofehCloud's default profile (file ~/.rofehcloud/profiles/default.yaml) with information about local clones of source code repositories you want RofehCloud to be aware of. For example:
name: default
description: Default profile

source_code_repositories:
- name: rofehcloud
  type: github
  local_directory: /Users/username/rofehcloud/rofehcloud
  description: AI-powered CLI tool for automatic troubleshooting issues with Kubernetes and public clouds like AWS, Google Cloud (GCP), and Azure.

Usage

To start using RofehCloud, run the following command:

make run

This will launch the application in the terminal console.

Safety

  • By default, RofehCloud is designed to do not send your infrastructure/code data to anywhere besides the configured LLM service
  • By default, the tool will ask the user confirmation before executing a command that may potentially change something on the local workstation or remote cloud system (please see the FAQ section below)
  • It is possible to configure RofehCloud to ask for a user confirmation before executing every command generated by the tool (please see the FAQ section below for option ASK_FOR_USER_CONFIRMATION_BEFORE_EXECUTING_EACH_COMMAND)

FAQ

How does RofehCloud prevent incidental modification of connected cloud resources?

By default, RofehCloud will validate every LLM-suggested CLI command whether the command can make any changes in the target system. If RofehCloud detects that the planned command can make a change, the tool will pause and ask for user confirmation whether to execute the command or not. For example:

? Enter your question:  Please create a new S3 bucket my-new-secret-bucket-for-test-data

New conversation label: Create S3 bucket for test data

> Entering new AgentExecutor chain...
To create an S3 bucket, I need to use the AWS CLI. I will proceed to run the command to create the bucket.

Action: Run a shell command or access CLI tools
Action Input: aws s3api create-bucket --bucket my-new-secret-bucket-for-test-data --region us-east-1

Attention! The system would like to execute a command that may change some data.
The command that is planned to be executed:
aws s3api create-bucket --bucket my-new-secret-bucket-for-test-data --region us-east-1

? Would you like the command to be executed? (Y/n)

Is there a way for the user to review and approve every command executed by RofehCloud?

Yes, this is possible. To enable the feature please set environment variable ASK_FOR_USER_CONFIRMATION_BEFORE_EXECUTING_EACH_COMMAND to value true, either using export command or in local .env file. For example:

export ASK_FOR_USER_CONFIRMATION_BEFORE_EXECUTING_EACH_COMMAND=true

or in the local .env file:

ASK_FOR_USER_CONFIRMATION_BEFORE_EXECUTING_EACH_COMMAND=true

Does RofehCloud support MinIO and OpenShift?

Yes, just add the names of relevant CLI tools to environment variable ADDITIONAL_TOOLS; for example:

export ADDITIONAL_TOOLS=mc,oc

Does RofehCloud support on-perm virtualization platforms like Nutanix or VMware ESXi?

Yes, just add the names of relevant CLI tools to environment variable ADDITIONAL_TOOLS; for example (in .env file):

ADDITIONAL_TOOLS=esxcli,ncli

Can RofehCloud send LLM call traces to LangSmith service?

Yes, this is possible. Please use the following procedure:

  1. Create a LangSmith account and create an API key (see bottom left corner). Familiarize yourself with the platform by looking through the docs
  2. In the local .env file add the following environment variables:
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=<YOUR-API-KEY>  # Update to your API key
LANGCHAIN_PROJECT="<YOUR-PROJECT-NAME>"  # Update to LangSmith project name

Feedback

If you have any questions or suggestions about RofehCloud you are welcome to use the following methods to provide your feedback:

Contributing

We welcome contributions to the RofehCloud project! If you would like to contribute, please follow these steps:

  1. Fork the repository (don't forget to configure a pre-commit hook).
  2. Create a new branch for your feature or bugfix.
  3. Commit your changes and push them to your fork.
  4. Submit a pull request with a detailed description of your changes.

Authors

The project was initially created by:

License

This project is licensed under the Mozilla Public License, version 2.0. See the LICENSE.txt file for more details.

Thank you for using RofehCloud!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rofehcloud-0.6.0.tar.gz (377.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rofehcloud-0.6.0-py3-none-any.whl (28.5 kB view details)

Uploaded Python 3

File details

Details for the file rofehcloud-0.6.0.tar.gz.

File metadata

  • Download URL: rofehcloud-0.6.0.tar.gz
  • Upload date:
  • Size: 377.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.6

File hashes

Hashes for rofehcloud-0.6.0.tar.gz
Algorithm Hash digest
SHA256 7b293df436eaccb414e8e8935a23c4e8b60661ab955074e8d930028f97a96241
MD5 fc7715c401b64199fa2f61fb60493f33
BLAKE2b-256 fb98624e2b6a767e5e95dd1c16570acf539df2904f00bb4ae7557803ae2475f9

See more details on using hashes here.

File details

Details for the file rofehcloud-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: rofehcloud-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 28.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.6

File hashes

Hashes for rofehcloud-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 78fde0b80e45523e377788900ee4995029c27ce9a1c06797d817c109c15163ea
MD5 c50e457a9ed19f217a95e93f1c22a9d3
BLAKE2b-256 7481624725f623e27dfc1eed4861a3bbea78644d11885f8c9f8535d4f52ab2ba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page