Skip to main content

Intelligent hardware detection and optimal LLM inference engine recommendations with Pydantic schemas

Project description

InferenceUtils - Intelligent Hardware Detection for LLM Inference

A comprehensive Python library that automatically detects your hardware capabilities and provides optimal recommendations for LLM inference engines and build configurations.

🚀 Quick Start

from inferenceutils import systeminfo, optimal_inference_engine, llama_cpp_build_args

# Get comprehensive hardware information
hw = systeminfo()
print(f"CPU: {hw.cpu.brand_raw}")
print(f"GPU: {hw.gpu.detected_vendor}")
print(f"RAM: {hw.ram.total_gb} GB")

# Get optimal inference engine recommendation
engine = optimal_inference_engine()
print(f"Recommended: {engine.name}")
print(f"Install: pip install {' '.join(engine.dependencies)}")

# Get optimal build arguments for llama-cpp-python
args = llama_cpp_build_args()
print(f"CMAKE_ARGS: {' '.join(args)}")

✨ Key Features

🔍 Intelligent Hardware Detection

  • Cross-platform: macOS, Linux, Windows
  • Comprehensive: CPU, GPU, RAM, storage, instruction sets
  • Type-safe: All data validated with Pydantic schemas
  • Pure Python: No external command execution required

🎯 Optimal Engine Recommendations

  • Hardware-aware: Automatically selects best engine for your system
  • Dependencies included: Provides exact pip install commands
  • Detailed reasoning: Explains why each engine was chosen
  • Performance-focused: Prioritizes fastest available hardware

⚡ Build Optimization

  • llama-cpp-python: Optimal CMAKE arguments for your hardware
  • GPU acceleration: CUDA, Metal, ROCm, Vulkan, SYCL
  • CPU optimization: AVX-512, AVX2, OpenMP, KleidiAI
  • Platform-specific: Different optimizations per OS

📦 Installation

# Install from source
git clone <repository-url>
cd InferenceUtils
pip install -e .

# Or install dependencies manually
pip install py-cpuinfo psutil nvidia-ml-py amdsmi openvino mlx pyobjc vulkan pydantic

🛠️ API Reference

Core Functions

systeminfo() -> HardwareProfile

Get comprehensive hardware information as a validated Pydantic BaseModel.

from inferenceutils import systeminfo

hw = systeminfo()

# Access typed data
print(f"OS: {hw.os.platform}")
print(f"CPU: {hw.cpu.brand_raw}")
print(f"RAM: {hw.ram.total_gb} GB")

# GPU information
if hw.gpu.detected_vendor == "NVIDIA":
    for gpu in hw.gpu.nvidia:
        print(f"NVIDIA: {gpu.model} ({gpu.vram_gb} GB)")
elif hw.gpu.detected_vendor == "Apple":
    print(f"Apple: {hw.gpu.apple.model}")

# Convert to JSON
json_data = hw.model_dump_json(indent=2)

optimal_inference_engine() -> OptimalInferenceEngine

Get the optimal inference engine recommendation with dependencies.

from inferenceutils import optimal_inference_engine

engine = optimal_inference_engine()

print(f"Engine: {engine.name}")
print(f"Dependencies: {engine.dependencies}")
print(f"Reason: {engine.reason}")

# Install the recommended engine
install_cmd = f"pip install {' '.join(engine.dependencies)}"
print(f"Run: {install_cmd}")

llama_cpp_build_args() -> List[str]

Get optimal CMAKE build arguments for llama-cpp-python.

from inferenceutils import llama_cpp_build_args, get_llama_cpp_install_command

# Get build arguments
args = llama_cpp_build_args()
print(f"CMAKE arguments: {' '.join(args)}")

# Get complete install command
install_cmd = get_llama_cpp_install_command()
print(f"Install command: {install_cmd}")

Pydantic Schemas

HardwareProfile

Complete hardware profile with all detected components.

from inferenceutils import HardwareProfile

# Validate hardware data
try:
    profile = HardwareProfile(**hardware_data)
    print("✅ Data is valid")
except ValidationError as e:
    print(f"❌ Validation failed: {e}")

OptimalInferenceEngine

Inference engine recommendation with dependencies.

from inferenceutils import OptimalInferenceEngine

# Create recommendation
recommendation = OptimalInferenceEngine(
    name="MLX",
    dependencies=["mlx-lm"],
    reason="Optimized for Apple Silicon"
)

🎯 Supported Inference Engines

Engine Best For Dependencies
TensorRT-LLM High-end NVIDIA GPUs (Ampere+) tensorrt-llm
vLLM NVIDIA GPUs (Turing/Volta+) vllm
MLX Apple Silicon mlx-lm
OpenVINO Intel GPUs/NPUs openvino
llama.cpp AMD GPUs, high-performance CPUs llama-cpp-python

🔧 Hardware Acceleration Support

GPU Backends

  • NVIDIA CUDA: Automatic compute capability detection
  • Apple Metal: Native Apple Silicon optimization
  • AMD ROCm: HIP acceleration for AMD GPUs
  • Intel SYCL: oneAPI for Intel accelerators
  • Vulkan: Cross-platform GPU acceleration

CPU Optimizations

  • Intel oneMKL: AVX-512 optimization
  • OpenBLAS: AVX2 acceleration
  • OpenMP: Multi-core parallelism
  • KleidiAI: ARM CPU optimization

📋 Example Output

Hardware Detection

{
  "os": {
    "platform": "Darwin",
    "version": "23.0.0",
    "architecture": "arm64"
  },
  "cpu": {
    "brand_raw": "Apple M2 Pro",
    "physical_cores": 10,
    "logical_cores": 10,
    "instruction_sets": ["neon"]
  },
  "ram": {
    "total_gb": 32.0,
    "available_gb": 24.5
  },
  "gpu": {
    "detected_vendor": "Apple",
    "apple": {
      "model": "Apple Silicon GPU",
      "vram_gb": 32.0,
      "metal_supported": true
    }
  }
}

Engine Recommendation

{
  "name": "MLX",
  "dependencies": ["mlx-lm"],
  "reason": "Natively designed for Apple Silicon. The system's unified memory architecture is best exploited by Apple's own MLX framework, which leverages the CPU, GPU, and Neural Engine."
}

Build Arguments

# Apple Silicon
-DGGML_METAL=ON -DGGML_SVE=OFF -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=Accelerate

# NVIDIA GPU
-DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=89 -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS

🚀 Use Cases

Development Setup

from inferenceutils import systeminfo, optimal_inference_engine

# Quick hardware overview
hw = systeminfo()
print(f"Setting up development environment for {hw.cpu.brand_raw}")

# Get recommended engine
engine = optimal_inference_engine()
print(f"Installing {engine.name}...")

CI/CD Pipelines

from inferenceutils import llama_cpp_build_args

# Generate build args for different runners
args = llama_cpp_build_args()
print(f"Building with: {' '.join(args)}")

User Documentation

from inferenceutils import optimal_inference_engine, get_llama_cpp_install_command

# Generate user-specific instructions
engine = optimal_inference_engine()
if engine.name == "llama.cpp":
    install_cmd = get_llama_cpp_install_command()
    print(f"Run: {install_cmd}")
else:
    print(f"Run: pip install {' '.join(engine.dependencies)}")

🔍 Hardware Detection Capabilities

CPU Detection

  • Model and architecture
  • Core count (physical/logical)
  • Instruction sets (AVX-512, AVX2, NEON, AMX)
  • Performance characteristics

GPU Detection

  • NVIDIA: Model, VRAM, compute capability, driver version
  • AMD: Model, VRAM, ROCm compatibility, compute units
  • Intel: Model, type (dGPU/iGPU/NPU), execution units
  • Apple: Model, unified memory, Metal support, GPU cores

Memory & Storage

  • Total and available RAM
  • Primary storage type (SSD/HDD)
  • Memory bandwidth considerations

NPU Detection

  • Apple Neural Engine: Core count, availability
  • Intel AI Boost: NPU detection and capabilities
  • AMD Ryzen AI: CPU-based detection

🛠️ Dependencies

Core Dependencies

  • py-cpuinfo: CPU information
  • psutil: System and process utilities
  • nvidia-ml-py: NVIDIA GPU monitoring
  • openvino: Intel accelerator support
  • mlx: Apple Silicon support
  • pyobjc: macOS system integration
  • vulkan: Vulkan API support
  • pydantic: Data validation and serialization
  • amdsmi: AMD GPU monitoring (install with pip install inferenceutils[amd])

📄 License

MIT License - see LICENSE file for details.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inferenceutils-0.1.0.tar.gz (22.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

inferenceutils-0.1.0-py3-none-any.whl (21.7 kB view details)

Uploaded Python 3

File details

Details for the file inferenceutils-0.1.0.tar.gz.

File metadata

  • Download URL: inferenceutils-0.1.0.tar.gz
  • Upload date:
  • Size: 22.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.13

File hashes

Hashes for inferenceutils-0.1.0.tar.gz
Algorithm Hash digest
SHA256 76e6a420199a2209ac35a76de4ebe70d908289140414ec29696bcd2bea945c78
MD5 9d18b8c6e2276249694decbaa31ec34f
BLAKE2b-256 0851e94fd1c8e8c0548ce55d3fe23d7ef2bc5449f7eac9a206494334582e4136

See more details on using hashes here.

File details

Details for the file inferenceutils-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: inferenceutils-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.13

File hashes

Hashes for inferenceutils-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 730026fe044062539688ca764400b47c166bc3305191fb43e3b8072e0176c393
MD5 9809851fef6a2d7a19ec8020ed222a11
BLAKE2b-256 14eb656f2de8b88a50f264398bc329d0a317e1fe3c1c5a3311036b7d071ab111

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page