Skip to main content

Edge AI deployment and management tools

Project description

Edge AI Platform

A comprehensive Edge AI platform with LLM (Ollama) and ML (ONNX Runtime) serving capabilities, monitoring, model conversion, and benchmarking tools.

๐Ÿ› ๏ธ Model Conversion & Validation Tools

This project includes a powerful command-line interface (CLI) for converting and validating machine learning models, with a focus on ONNX format.

Installation

# Install the package in development mode
pip install -e .

# Install with TensorFlow support (for Keras/SavedModel conversion)
pip install -e .[tensorflow]

# Install with PyTorch support
pip install -e .[torch]

# Install with all dependencies
pip install -e .[all]

CLI Usage

Model Benchmarking

Benchmark ONNX models for performance metrics:

# Benchmark a single model
wronai_edge benchmark path/to/model.onnx --input-shape 1,3,224,224

# Compare multiple models
wronai_edge benchmark model1.onnx model2.onnx --compare --input-shape 1,3,224,224

# Customize benchmark parameters
wronai_edge benchmark model.onnx --warmup 20 --runs 200 --cpu

Options:

  • --input-shape, -i: Input shape (can be specified multiple times for multiple inputs)
  • --warmup: Number of warmup runs (default: 10)
  • --runs: Number of benchmark runs (default: 100)
  • --cpu/--gpu: Force CPU or GPU usage (default: GPU if available)
  • --compare: Compare multiple models side by side

Model Validation

Validate an ONNX model:

wronai_edge test-model path/to/model.onnx

Options:

  • --output-json: Save validation results to a JSON file
  • --verbose, -v: Enable verbose output

Example:

wronai_edge test-model models/simple-model.onnx --output-json validation_results.json --verbose

Model Conversion

Convert models between different formats using the convert command group.

PyTorch to ONNX:

wronai_edge convert pytorch model.pt output.onnx --input-shape 1,3,224,224

Keras to ONNX:

wronai_edge convert keras model.h5 output.onnx --input-shape 1,224,224,3

TensorFlow SavedModel to ONNX:

wronai_edge convert saved-model saved_model_dir output.onnx

Common options for conversion:

  • --opset: ONNX opset version (default: 13)
  • --verbose, -v: Enable verbose output

Python API

You can also use the conversion and validation tools programmatically:

from wronai_edge import validate_model, convert_to_onnx

# Validate a model
results = validate_model("model.onnx")
print(f"Model validation passed: {results['validation_summary']['passed']}")

# Convert a PyTorch model to ONNX
convert_to_onnx(
    model_path="model.pt",
    output_path="output.onnx",
    input_shapes=[(1, 3, 224, 224)],
    opset_version=13
)

For more examples, see the examples directory.

๐Ÿ“š Documentation

For detailed documentation about the Edge AI platform, including LLM serving and monitoring, see the sections below.

๐Ÿ“š Documentation

๐Ÿš€ Quick Start

Prerequisites

  • Docker and Docker Compose
  • Python 3.8+ (for running tests and examples)
  • At least 8GB RAM (16GB recommended for running LLMs)
  • curl and jq (for testing and examples)

Starting the Platform

  1. Clone the repository:

    git clone https://github.com/wronai/edge.git
    cd edge
    
  2. Start all services:

    docker-compose up -d
    
  3. Verify services are running:

    docker-compose ps
    

    All services should show as "healthy" or "running".

  4. Run the test suite to verify everything is working:

    ./test_services.sh
    

Accessing Services

ONNX Runtime Management

# Check ONNX Runtime status
make onnx-status

# List available ONNX models
make onnx-models

# Load a new model
make onnx-load MODEL=simple-model MODEL_SOURCE=./models/simple-model.onnx

# Test inference with a sample request
make onnx-test

For detailed ONNX Runtime documentation, see docs/onnx-runtime.md

Example: Using ONNX Runtime

Here's how to use the ONNX Runtime service for model inference:

  1. Check service health:

    curl http://localhost:8001/health
    # Expected response: {"status": "OK"}
    
  2. List available models:

    curl http://localhost:8001/v1/models
    # Example response: {"models": ["model1.onnx", "model2.onnx"]}
    
  3. Run inference (using Python):

    import requests
    import numpy as np
    
    # Sample input data (adjust based on your model's expected input)
    input_data = {
        "model_name": "wronai.onnx",
        "input": {
            "input_1": np.random.rand(1, 224, 224, 3).tolist()  # Example for image input
        }
    }
    
    # Send inference request
    response = requests.post(
        "http://localhost:8001/v1/models/your_model:predict",
        json=input_data
    )
    
    # Process the response
    if response.status_code == 200:
        predictions = response.json()
        print("Inference successful!")
        print(f"Predictions: {predictions}")
    else:
        print(f"Error: {response.status_code}")
        print(response.text)
    
  4. Using cURL for simple inference:

    curl -X POST http://localhost:8001/v1/models/your_model:predict \
         -H "Content-Type: application/json" \
         -d '{"input": [[[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]]]}'
    

For more advanced usage, refer to the API Reference.

Stopping the Platform

To stop all services:

docker-compose down

To remove all data (including models and metrics):

docker-compose down -v

๐Ÿ—๏ธ Architecture

graph TD
    A[Client] -->|HTTP/HTTPS| B[Nginx Gateway]
    B -->|/api/ollama/*| C[Ollama Service]
    B -->|/api/onnx/*| D[ONNX Runtime]
    B -->|/grafana| E[Grafana]
    B -->|/prometheus| F[Prometheus]
    G[Prometheus] -->|Scrape Metrics| H[Services]
    E -->|Query| G
    C -->|Store Models| I[(Ollama Models)]
    D -->|Load Models| J[(ONNX Models)]

๐Ÿ”ง Services

Core Services

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Service         โ”‚ Port     โ”‚ Description                              โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Nginx Gateway   โ”‚ 30080    โ”‚ API Gateway and reverse proxy            โ”‚
โ”‚ Ollama          โ”‚ 11435    โ”‚ LLM serving (compatible with OpenAI API) โ”‚
โ”‚ ONNX Runtime    โ”‚ 8001     โ”‚ ML model inference                       โ”‚
โ”‚ Prometheus      โ”‚ 9090     โ”‚ Metrics collection and alerting          โ”‚
โ”‚ Grafana         โ”‚ 3007     โ”‚ Monitoring dashboards                    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ“ˆ Monitoring

Access the monitoring dashboards:

๐Ÿงช Testing

Running Tests

We provide test scripts to verify all services are functioning correctly:

  1. Basic Service Tests - Verifies all core services are running and accessible:

    # Run all tests
    make test
    
    # Or run individual tests
    ./test_services.sh
    
  2. ONNX Runtime Tests - Test ONNX Runtime functionality:

    # Check ONNX Runtime status
    make onnx-status
    
    # Test with a sample request
    make onnx-test
    
  3. ONNX Model Test - Validates ONNX model loading and inference (requires Python dependencies):

    python3 -m pip install -r requirements-test.txt
    python3 test_onnx_model.py
    
  4. API Endpoint Tests - Comprehensive API tests (requires Python dependencies):

    python3 test_endpoints.py
    

Expected Test Results

When all services are running correctly, you should see output similar to:

=== Testing Direct Endpoints ===
Testing Ollama API (http://localhost:11435/api/tags)... PASS (Status: 200)
Testing ONNX Runtime (http://localhost:8001/v1/)... PASS (Status: 405)

=== Testing Through Nginx Gateway ===
Testing Nginx -> Ollama (http://localhost:30080/api/tags)... PASS (Status: 200)
Testing Nginx -> ONNX Runtime (http://localhost:30080/v1/)... PASS (Status: 405)
Testing Nginx Health Check (http://localhost:30080/health)... PASS (Status: 200)

=== Testing Monitoring ===
Testing Prometheus (http://localhost:9090)... PASS (Status: 302)
Testing Prometheus Graph (http://localhost:9090/graph)... PASS (Status: 200)
Testing Grafana (http://localhost:3007)... PASS (Status: 302)
Testing Grafana Login (http://localhost:3007/login)... PASS (Status: 200)

Note: A 405 status for ONNX Runtime is expected for GET requests to /v1/ as it requires POST requests for inference. The 302 status codes for Prometheus and Grafana are expected redirects to their respective UIs.

๐Ÿงน Cleanup

Stop Services

# Stop all services
make stop

# Remove all containers and volumes
make clean

# Remove all unused Docker resources
make prune

ONNX Model Management

# List loaded models
make onnx-models

# To remove models, simply delete them from the models/ directory
rm models/*.onnx

๐Ÿ“„ License

This project is licensed under the Apache Software License - see the LICENSE file for details. ONNX

๐Ÿš€ Features

  • Multi-Model Serving: Run multiple AI/ML models simultaneously
  • Optimized Inference: ONNX Runtime for high-performance model execution
  • LLM Support: Ollama integration for local LLM deployment
  • Monitoring: Built-in Prometheus and Grafana for observability
  • Scalable: Kubernetes-native design for easy scaling
  • Developer-Friendly: Simple CLI and comprehensive API

๐Ÿ“š Documentation

Getting Started

Examples

Guides

๐Ÿš€ Quick Start

Prerequisites

  • Docker and Docker Compose
  • 8GB+ RAM (16GB recommended)
  • 20GB free disk space

Start Services

# Clone the repository
git clone https://github.com/wronai/edge.git
cd edge

# Start all services
make up

# Check service status
make status

Access Services

๐Ÿ› ๏ธ Development

Project Structure

edge/
โ”œโ”€โ”€ docs/               # Documentation
โ”œโ”€โ”€ configs/            # Configuration files
โ”œโ”€โ”€ k8s/                # Kubernetes manifests
โ”œโ”€โ”€ scripts/            # Utility scripts
โ”œโ”€โ”€ terraform/          # Infrastructure as Code
โ”œโ”€โ”€ docker-compose.yml   # Local development
โ””โ”€โ”€ Makefile            # Common tasks

Common Tasks

# Start services
make up

# Stop services
make down

# View logs
make logs

# Access monitoring
make monitor

# Run tests
make test

๐Ÿค Contributing

Contributions are welcome! Please see our Contributing Guide for details.

๐Ÿ“„ License

This project is licensed under the Apache Software License - see the LICENSE file for details.

๐Ÿ“ง Contact

For support or questions, please open an issue in the repository.

๐Ÿš€ Quick Start (2 minutes to live demo)

Prerequisites

  • Docker Desktop (running)
  • Terraform >= 1.6
  • kubectl >= 1.28
  • 8GB RAM minimum

One-Command Deployment

# Clone and deploy
git clone https://github.com/wronai/edge.git
cd edge

# Make script executable and deploy everything
chmod +x scripts/deploy.sh
./scripts/deploy.sh

๐ŸŽฏ Result: Complete edge AI platform with monitoring in ~3-5 minutes

docker compose ps

output:

docker compose ps
NAME                IMAGE                    COMMAND                  SERVICE             CREATED             STATUS              PORTS
edge-grafana-1      grafana/grafana:latest   "/run.sh"                grafana             3 days ago          Up 8 minutes        0.0.0.0:3007->3000/tcp, :::3007->3000/tcp
edge-ollama-1       ollama/ollama:latest     "/bin/sh -c 'sleep 1โ€ฆ"   ollama              3 days ago          Up 8 minutes        0.0.0.0:11435->11434/tcp, :::11435->11434/tcp
edge-prometheus-1   prom/prometheus:latest   "/bin/prometheus --cโ€ฆ"   prometheus          3 days ago          Up 8 minutes        0.0.0.0:9090->9090/tcp, :::9090->9090/tcp

Instant Access

wronai_edge-portfolio/
โ”œโ”€โ”€ terraform/main.tf          # Infrastruktura (K3s + Docker)
โ”œโ”€โ”€ k8s/ai-platform.yaml       # AI workloady (ONNX + Ollama)
โ”œโ”€โ”€ k8s/monitoring.yaml         # Monitoring (Prometheus + Grafana)
โ”œโ”€โ”€ configs/Modelfile           # Custom LLM konfiguracja
โ”œโ”€โ”€ scripts/deploy.sh           # Automatyzacja (jeden skrypt)
โ””โ”€โ”€ README.md                   # Kompletna dokumentacja

๐Ÿ—๏ธ Architecture Overview

graph TB
    U[User] --> G[AI Gateway :30080]
    G --> O[ONNX Runtime]
    G --> L[Ollama LLM]
    
    P[Prometheus :30090] --> O
    P --> L
    P --> G
    
    GR[Grafana :30030] --> P
    
    subgraph "K3s Cluster"
        O
        L
        G
        P
        GR
    end
    
    subgraph "Infrastructure"
        T[Terraform] --> K[K3s]
        K --> O
        K --> L
    end

Technology Stack

Layer Technology Purpose
Infrastructure Terraform + Docker IaC provisioning
Orchestration K3s (Lightweight Kubernetes) Container management
AI Inference ONNX Runtime + Ollama Model serving
Load Balancing Nginx Gateway Traffic routing
Monitoring Prometheus + Grafana Observability
Automation Bash + YAML Deployment scripts

๐Ÿค– AI Capabilities Demo

Test ONNX Runtime

Health Check

# Check if the ONNX Runtime service is healthy
curl -X GET http://localhost:8001/
# Expected Response: "Healthy"

Model Management

# List available models in the models directory
make onnx-models

# Check model status
make onnx-model-status

# Get model metadata
make onnx-model-metadata

Model Inference

# Make a prediction using the default model (complex-cnn-model)
make onnx-predict

# Or use curl directly
curl -X POST http://localhost:8001/v1/models/complex-cnn-model/versions/1:predict \
  -H "Content-Type: application/json" \
  -d '{"instances": [{"data": [1.0, 2.0, 3.0, 4.0]}]}'

# Example with Python
python3 -c "
import requests
import json

response = requests.post(
    'http://localhost:8001/v1/models/complex-cnn-model/versions/1:predict',
    json={"instances": [{"data": [1.0, 2.0, 3.0, 4.0]}]}
)
print(json.dumps(response.json(), indent=2))
"

Benchmarking

# Run a benchmark with 100 requests
make onnx-benchmark

# Customize model and version
make onnx-benchmark MODEL_NAME=my-model MODEL_VERSION=2

Notes:

  • The server automatically loads models from the /models directory in the container
  • To use a different model:
    1. Place your .onnx model file in the ./models directory
    2. Update the model name/version in your requests or set environment variables:
      export MODEL_NAME=your-model
      export MODEL_VERSION=1
      
    3. Or specify them when running commands:
      make onnx-predict MODEL_NAME=your-model MODEL_VERSION=1
      

Test Ollama LLM

# Simple chat
curl -X POST http://localhost:30080/api/generate \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2:1b",
    "prompt": "Explain edge computing",
    "stream": false
  }'

# Custom edge AI assistant
curl -X POST http://localhost:30080/api/generate \
  -H "Content-Type: application/json" \
  -d '{
    "model": "wronai_edge-assistant",
    "prompt": "How do I monitor Kubernetes pods?",
    "stream": false
  }'

Interactive Demo

# Run comprehensive AI functionality test
./scripts/deploy.sh demo

# Test individual components
./scripts/deploy.sh test

output:

# Test individual components
./scripts/deploy.sh test
[ERROR] 19:27:54 Unknown command: demo
[INFO] 19:27:54 Run './scripts/deploy.sh help' for usage information
[STEP] 19:27:54 ๐Ÿ” Testing deployed services...
[INFO] 19:27:54 Testing service endpoints...
[ERROR] 19:27:54 โŒ AI Gateway: FAILED
[WARN] 19:27:54 โš ๏ธ Ollama: Not ready (may still be starting)
[WARN] 19:27:54 โš ๏ธ ONNX Runtime: Not ready
[INFO] 19:27:54 โœ… Prometheus: OK
[INFO] 19:27:54 โœ… Grafana: OK
[INFO] 19:27:54 Testing AI functionality...
[WARN] 19:27:54 โš ๏ธ AI Generation: Model may still be downloading
[WARN] 19:27:54 โš ๏ธ Some services need more time to start

Run a diagnosis to check your system:

./scripts/deploy.sh diagnose

output:

...
- context:
    cluster: kind-wronai_edge
    user: kind-wronai_edge
[STEP] 19:32:14 ๐Ÿ” Testing service connectivity...
//localhost:30080/health:AI Gateway: โŒ NOT RESPONDING
//localhost:30090/-/healthy:Prometheus: โŒ NOT RESPONDING
//localhost:30030/api/health:Grafana: โŒ NOT RESPONDING
//localhost:11435/api/tags:Ollama Direct: โŒ NOT RESPONDING
//localhost:8001/v1/models:ONNX Direct: โŒ NOT RESPONDING

[STEP] 19:32:14 ๐Ÿ” Diagnosis complete!

Fix and deploy the services:

./scripts/deploy.sh fix

Test the services after deployment:

./scripts/deploy.sh test

๐Ÿ“Š Monitoring & Observability

Grafana Dashboard

  • URL: http://localhost:30030
  • Login: admin/admin
  • Features:
    • Real-time AI inference metrics
    • Resource utilization monitoring
    • Request latency distribution
    • Error rate tracking
    • Pod health status

Prometheus Metrics

  • URL: http://localhost:30090
  • Key Metrics:
    • http_requests_total - Request counters
    • http_request_duration_seconds - Latency histograms
    • container_memory_usage_bytes - Memory consumption
    • container_cpu_usage_seconds_total - CPU utilization

Health Monitoring

# Comprehensive health check
./scripts/deploy.sh health

# Check specific components
kubectl get pods -A
kubectl top nodes
kubectl top pods -A

๐Ÿ› ๏ธ Operations & Maintenance

Common Operations

# Check deployment status
./scripts/deploy.sh info

# View live logs
kubectl logs -f deployment/ollama-llm -n ai-inference
kubectl logs -f deployment/onnx-inference -n ai-inference

# Scale AI services
kubectl scale deployment onnx-inference --replicas=3 -n ai-inference

# Update configurations
kubectl apply -f k8s/ai-platform.yaml

Troubleshooting

Common Issues and Solutions

1. Disk Space Issues If the deployment fails with eviction errors or the cluster won't start:

# Check disk space
df -h

# Clean up Docker system
docker system prune -a -f --volumes

# Remove unused containers, networks, and images
docker container prune -f
docker image prune -a -f
docker network prune -f
docker volume prune -f

# Clean up old logs and temporary files
sudo journalctl --vacuum-time=3d
sudo find /var/log -type f -name "*.gz" -delete
sudo find /var/log -type f -name "*.1" -delete

2. Debugging K3s Cluster

# Check K3s server logs
docker logs k3s-server

# Check cluster status
docker exec k3s-server kubectl get nodes
docker exec k3s-server kubectl get pods -A

3. Port Conflicts If you see port binding errors, check and free up required ports (80, 443, 6443, 30030, 30090, 30080):

# Check port usage
sudo lsof -i :8080  # Replace with your port number

4. Debugging Pods

# Debug pod issues
kubectl describe pod <pod-name> -n ai-inference

# Check resource usage
kubectl top pods -n ai-inference --sort-by=memory

# View events
kubectl get events -n ai-inference --sort-by='.lastTimestamp'

# Restart services
kubectl rollout restart deployment/ollama-llm -n ai-inference

5. Reset Everything If you need to start fresh:

# Clean up all resources
./scripts/deploy.sh cleanup

# Remove all Docker resources
docker system prune -a --volumes --force

# Remove K3s data
sudo rm -rf terraform/kubeconfig/*
sudo rm -rf terraform/k3s-data/*
sudo rm -rf terraform/registry-data/*

Cleanup

# Complete cleanup
./scripts/deploy.sh cleanup

# Partial cleanup (keep infrastructure)
kubectl delete -f k8s/monitoring.yaml
kubectl delete -f k8s/ai-platform.yaml

๐Ÿ“ Project Structure

wronai_edge-portfolio/
โ”œโ”€โ”€ terraform/
โ”‚   โ””โ”€โ”€ main.tf                 # Complete infrastructure as code
โ”œโ”€โ”€ k8s/
โ”‚   โ”œโ”€โ”€ ai-platform.yaml       # AI workloads (ONNX + Ollama + Gateway)
โ”‚   โ””โ”€โ”€ monitoring.yaml         # Observability stack (Prometheus + Grafana)
โ”œโ”€โ”€ configs/
โ”‚   โ””โ”€โ”€ Modelfile              # Custom LLM configuration
โ”œโ”€โ”€ scripts/
โ”‚   โ””โ”€โ”€ deploy.sh              # Automation script (8 commands)
โ””โ”€โ”€ README.md                  # This documentation

Total Files: 6 core files + documentation = Minimal complexity, maximum demonstration

๐ŸŽฏ Skills Demonstrated

DevOps Excellence

  • โœ… Infrastructure as Code - Pure Terraform configuration
  • โœ… Container Orchestration - Kubernetes/K3s with proper manifests
  • โœ… Declarative Automation - YAML-driven deployments
  • โœ… Monitoring & Observability - Production-ready metrics
  • โœ… Security Best Practices - RBAC, network policies, resource limits
  • โœ… Scalability Patterns - HPA, resource management
  • โœ… GitOps Ready - Declarative configuration management

AI/ML Integration

  • โœ… Model Serving - ONNX Runtime for optimized inference
  • โœ… LLM Deployment - Ollama with custom model configuration
  • โœ… Edge Computing - Resource-constrained deployment patterns
  • โœ… Load Balancing - Intelligent traffic routing for AI services
  • โœ… Performance Monitoring - AI-specific metrics and alerting

Modern Patterns

  • โœ… Microservices Architecture - Service mesh ready
  • โœ… Cloud Native - CNCF-aligned tools and patterns
  • โœ… Edge Computing - Lightweight, distributed deployments
  • โœ… Observability - Three pillars (metrics, logs, traces)
  • โœ… Automation - Zero-touch deployment and operations

๐Ÿ”ง Customization & Extensions

Add Custom Models

# Add new ONNX model
kubectl create configmap wronai --from-file=model.onnx -n ai-inference
# Update deployment to mount the model

# Create custom Ollama model
kubectl exec -n ai-inference deployment/ollama-llm -- \
  ollama create my-custom-model -f /path/to/Modelfile

Scale for Production

# Multi-node cluster
# Update terraform/main.tf to add worker nodes

# Persistent storage
# Add PVC configurations for model storage

# External load balancer
# Configure LoadBalancer service type

# TLS termination
# Add cert-manager and ingress controller

Advanced Monitoring

# Add custom metrics
# Extend Prometheus configuration

# Custom dashboards
# Add Grafana dashboard JSON files

# Alerting rules
# Configure AlertManager for notifications

๐Ÿ“ˆ Performance & Benchmarks

Resource Usage (Default Configuration)

  • Total Memory: ~4GB (K3s + AI services + monitoring)
  • CPU Usage: ~2 cores (under load)
  • Storage: ~2GB (container images + models)
  • Network: Minimal (edge-optimized)

Performance Metrics

  • Deployment Time: 3-5 minutes (cold start)
  • AI Response Time: <2s (LLM inference)
  • Monitoring Latency: <100ms (metrics collection)
  • Scaling Time: <30s (pod autoscaling)

Optimization Opportunities

  • Model Quantization: 4x memory reduction with ONNX INT8
  • Caching: Redis for frequently accessed inference results
  • Batching: Group inference requests for better throughput
  • GPU Acceleration: CUDA/ROCm support for faster inference

๐ŸŒŸ Why This Project Stands Out

For Hiring Managers

  • Practical Skills: Real-world DevOps patterns, not toy examples
  • Modern Stack: Current best practices and CNCF-aligned tools
  • AI Integration: Demonstrates understanding of ML deployment challenges
  • Production Ready: Monitoring, scaling, security considerations
  • Time Efficient: Complete demo in under 5 minutes

For Technical Teams

  • Minimal Complexity: 6 core files, maximum clarity
  • Declarative Approach: Infrastructure and workloads as code
  • Extensible Architecture: Easy to add features and scale
  • Edge Optimized: Real-world resource constraints considered
  • Documentation: Clear instructions and troubleshooting guides

For Business Value

  • Fast Deployment: Rapid prototyping and development cycles
  • Cost Effective: Efficient resource utilization
  • Scalable Design: Grows from demo to production
  • Risk Mitigation: Proven patterns and reliable automation
  • Innovation Ready: Foundation for AI/ML initiatives

๐Ÿค About the Author

Tom Sapletta - DevOps Engineer & AI Integration Specialist

  • ๐Ÿ”ง 15+ years enterprise DevOps experience
  • ๐Ÿค– AI/LLM deployment expertise with edge computing focus
  • ๐Ÿ—๏ธ Infrastructure as Code advocate and practitioner
  • ๐Ÿ“Š Monitoring & Observability specialist
  • ๐Ÿš€ Kubernetes & Cloud Native architect

Current Focus: Telemonit - Edge AI power supply systems with integrated LLM capabilities


This project demonstrates practical DevOps skills through minimal, production-ready code that showcases Infrastructure as Code, AI integration, and modern container orchestration patterns. Perfect for demonstrating technical competency to potential employers in the DevOps and AI engineering space.

๐Ÿ“„ License

This project is open source and available under the Apache License.


๐ŸŽฏ Ready to deploy? Run ./scripts/deploy.sh and see it in action!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wronai_edge-0.1.6.tar.gz (28.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wronai_edge-0.1.6-py3-none-any.whl (24.1 kB view details)

Uploaded Python 3

File details

Details for the file wronai_edge-0.1.6.tar.gz.

File metadata

  • Download URL: wronai_edge-0.1.6.tar.gz
  • Upload date:
  • Size: 28.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.11.12 Linux/6.14.11-300.fc42.x86_64

File hashes

Hashes for wronai_edge-0.1.6.tar.gz
Algorithm Hash digest
SHA256 c1aef7a0f01a4b01d73cebde96a1bb62732a03cb6129e1f159c2528e35a2ad5a
MD5 43db02d5848d7687ff939d49bb119513
BLAKE2b-256 51c5b1108bf600040d7b2eafb482ddd2355ba014c5b64957d4a52d7c07e6c440

See more details on using hashes here.

File details

Details for the file wronai_edge-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: wronai_edge-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 24.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.11.12 Linux/6.14.11-300.fc42.x86_64

File hashes

Hashes for wronai_edge-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 595315278f625166136f49e6896a491455edc6a597d168d6c235d57baed00c08
MD5 09d4e47110c3aa2889079fc167cbc4f7
BLAKE2b-256 df1fc55c0a96a558009e9ef267b5644fcd2542afda351f81736d60ea86d18689

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page