Skip to main content

An SRE AI agent to analyze, and report back on health of a node

Project description

AI SRE for system health triaging

As an SRE, Cloud engineer, you may have to constantly look at logs, metrics, traces to troubleshoot and triage issues to figure out why particular systems may be having issues. SystemHealthAI ( SHAI ) is an AI agent which will act as an AI SRE, to look at different data sources like prometheus, elasticsearch, cloudwatch, splunk and help triage issues and provide insights into why the system or systems might be acting up.

SHAI Architecture

SHAI Architecture

Show Your Support ⭐

If you find SHAI useful, please consider giving it a STAR ! ⭐

Quick start

Pre-Reqs

  • Install uv to run mcp servers
  • OpenAI Api Key
  • Datasource url for prometheus have a prometheus url ready to use
  • pip or poetry

Using pip

pip install systemhealthai

From Source using poetry

git clone git@github.com:ajinkyakadam/systemhealthai.git
cd systemhealthai
poetry install -e . 

Setup

Using SHAI

shai nodeA --model "openai:o4-mini"

The above command instructs shai to use the o4-mini model and triage the nodeA server. Please replace the nodeA with an actual hostname that you would like to find information for.

Roadmap

Datasource support

Data Source Status Description
Prometheus Find node metrics to correlate and triage health issues
Grafana Loki 🟡 search loki logs
Elasticsearch 🟡 search elasticsearch logs for system issues
Splunk 🟡 search splunk logs for system issues

LLM Provider Support

Provider Status Description
OpenAI Integrate with OpenAI models for advanced insights and triaging
Claude 🟡 Support for Claude models to assist in system health analysis
Hugging Face 🟡 Utilize Hugging Face models
Local LLMs 🟡 Deploy and use local LLMs for on-premise triaging solutions

How to Contribute

Contributions are welcome, be it bug reports, feature requests, or PRs!

  • Open a github issue to report issues, or suggest features
  • Open a pull request with improvements
  • Share your experience and how it has been useful to you or your organization.

Citation

If you use shai in your work, blogs, projects, please do cite:

@software{systemhealthai,
  author = {Kadam, Ajinkya},
  title = {SHAI: An AI SRE for triaging system health issues},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/ajinkyakadam/systemhealthai}
}

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

systemhealthai-0.1.0.tar.gz (7.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

systemhealthai-0.1.0-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file systemhealthai-0.1.0.tar.gz.

File metadata

  • Download URL: systemhealthai-0.1.0.tar.gz
  • Upload date:
  • Size: 7.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.13.3 Darwin/22.5.0

File hashes

Hashes for systemhealthai-0.1.0.tar.gz
Algorithm Hash digest
SHA256 19b5fe072e926b5a5575b3a50a3762268218a5175b3ab925efffc2871dd1796a
MD5 0d37b5bdd5a21829eb25b4456ddb5393
BLAKE2b-256 f7a43b1cb9fdb98a8524c1c7c0b9fbc626d815b53410673ae59b850ead425094

See more details on using hashes here.

File details

Details for the file systemhealthai-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: systemhealthai-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.13.3 Darwin/22.5.0

File hashes

Hashes for systemhealthai-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 18ba6a08d5eb4c1f69c710c8e30686e650957cbb820bcb15a617978f28414830
MD5 87184d7d8731ba9190dbe9eed2f61e75
BLAKE2b-256 a9c67cdb1f732f10889dbc4e46660767750ca6b2baab241ec75e472e826e7a8d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page