An SRE AI agent to analyze, and report back on health of a node
Project description
AI SRE for system health triaging
As an SRE, Cloud engineer, you may have to constantly look at logs, metrics, traces to troubleshoot and triage issues to figure out why particular systems may be having issues. SystemHealthAI ( SHAI ) is an AI agent which will act as an AI SRE, to look at different data sources like prometheus, elasticsearch, cloudwatch, splunk and help triage issues and provide insights into why the system or systems might be acting up.
SHAI Architecture
Show Your Support ⭐
If you find SHAI useful, please consider giving it a STAR ! ⭐
Quick start
Pre-Reqs
- Install
uvto run mcp servers - OpenAI Api Key
- Datasource url for prometheus have a prometheus url ready to use
- pip or poetry
Using pip
pip install systemhealthai
From Source using poetry
git clone git@github.com:ajinkyakadam/systemhealthai.git
cd systemhealthai
poetry install -e .
Setup
Using SHAI
shai nodeA --model "openai:o4-mini"
The above command instructs shai to use the o4-mini model and triage the nodeA server.
Please replace the nodeA with an actual hostname that you would like to find information for.
Roadmap
Datasource support
| Data Source | Status | Description |
|---|---|---|
| Prometheus | ✅ | Find node metrics to correlate and triage health issues |
| Grafana Loki | 🟡 | search loki logs |
| Elasticsearch | 🟡 | search elasticsearch logs for system issues |
| Splunk | 🟡 | search splunk logs for system issues |
LLM Provider Support
| Provider | Status | Description |
|---|---|---|
| OpenAI | ✅ | Integrate with OpenAI models for advanced insights and triaging |
| Claude | 🟡 | Support for Claude models to assist in system health analysis |
| Hugging Face | 🟡 | Utilize Hugging Face models |
| Local LLMs | 🟡 | Deploy and use local LLMs for on-premise triaging solutions |
How to Contribute
Contributions are welcome, be it bug reports, feature requests, or PRs!
- Open a github issue to report issues, or suggest features
- Open a pull request with improvements
- Share your experience and how it has been useful to you or your organization.
Citation
If you use shai in your work, blogs, projects, please do cite:
@software{systemhealthai,
author = {Kadam, Ajinkya},
title = {SHAI: An AI SRE for triaging system health issues},
year = {2025},
publisher = {GitHub},
url = {https://github.com/ajinkyakadam/systemhealthai}
}
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file systemhealthai-0.1.0.tar.gz.
File metadata
- Download URL: systemhealthai-0.1.0.tar.gz
- Upload date:
- Size: 7.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.2 CPython/3.13.3 Darwin/22.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19b5fe072e926b5a5575b3a50a3762268218a5175b3ab925efffc2871dd1796a
|
|
| MD5 |
0d37b5bdd5a21829eb25b4456ddb5393
|
|
| BLAKE2b-256 |
f7a43b1cb9fdb98a8524c1c7c0b9fbc626d815b53410673ae59b850ead425094
|
File details
Details for the file systemhealthai-0.1.0-py3-none-any.whl.
File metadata
- Download URL: systemhealthai-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.2 CPython/3.13.3 Darwin/22.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
18ba6a08d5eb4c1f69c710c8e30686e650957cbb820bcb15a617978f28414830
|
|
| MD5 |
87184d7d8731ba9190dbe9eed2f61e75
|
|
| BLAKE2b-256 |
a9c67cdb1f732f10889dbc4e46660767750ca6b2baab241ec75e472e826e7a8d
|