Intelligent hardware detection and optimal LLM inference engine recommendations with Pydantic schemas
Project description
InferenceUtils - Intelligent Hardware Detection for LLM Inference
A comprehensive Python library that automatically detects your hardware capabilities and provides optimal recommendations for LLM inference engines and build configurations.
🚀 Quick Start
from inferenceutils import systeminfo, optimal_inference_engine, llama_cpp_build_args
# Get comprehensive hardware information
hw = systeminfo()
print(f"CPU: {hw.cpu.brand_raw}")
print(f"GPU: {hw.gpu.detected_vendor}")
print(f"RAM: {hw.ram.total_gb} GB")
# Get optimal inference engine recommendation
engine = optimal_inference_engine()
print(f"Recommended: {engine.name}")
print(f"Install: pip install {' '.join(engine.dependencies)}")
# Get optimal build arguments for llama-cpp-python
args = llama_cpp_build_args()
print(f"CMAKE_ARGS: {' '.join(args)}")
✨ Key Features
🔍 Intelligent Hardware Detection
- Cross-platform: macOS, Linux, Windows
- Comprehensive: CPU, GPU, RAM, storage, instruction sets
- Type-safe: All data validated with Pydantic schemas
- Pure Python: No external command execution required
🎯 Optimal Engine Recommendations
- Hardware-aware: Automatically selects best engine for your system
- Dependencies included: Provides exact pip install commands
- Detailed reasoning: Explains why each engine was chosen
- Performance-focused: Prioritizes fastest available hardware
⚡ Build Optimization
- llama-cpp-python: Optimal CMAKE arguments for your hardware
- GPU acceleration: CUDA, Metal, ROCm, Vulkan, SYCL
- CPU optimization: AVX-512, AVX2, OpenMP, KleidiAI
- Platform-specific: Different optimizations per OS
📦 Installation
# Install from source
git clone <repository-url>
cd InferenceUtils
pip install -e .
# Or install dependencies manually
pip install py-cpuinfo psutil nvidia-ml-py amdsmi openvino mlx pyobjc vulkan pydantic
🛠️ API Reference
Core Functions
systeminfo() -> HardwareProfile
Get comprehensive hardware information as a validated Pydantic BaseModel.
from inferenceutils import systeminfo
hw = systeminfo()
# Access typed data
print(f"OS: {hw.os.platform}")
print(f"CPU: {hw.cpu.brand_raw}")
print(f"RAM: {hw.ram.total_gb} GB")
# GPU information
if hw.gpu.detected_vendor == "NVIDIA":
for gpu in hw.gpu.nvidia:
print(f"NVIDIA: {gpu.model} ({gpu.vram_gb} GB)")
elif hw.gpu.detected_vendor == "Apple":
print(f"Apple: {hw.gpu.apple.model}")
# Convert to JSON
json_data = hw.model_dump_json(indent=2)
optimal_inference_engine() -> OptimalInferenceEngine
Get the optimal inference engine recommendation with dependencies.
from inferenceutils import optimal_inference_engine
engine = optimal_inference_engine()
print(f"Engine: {engine.name}")
print(f"Dependencies: {engine.dependencies}")
print(f"Reason: {engine.reason}")
# Install the recommended engine
install_cmd = f"pip install {' '.join(engine.dependencies)}"
print(f"Run: {install_cmd}")
llama_cpp_build_args() -> List[str]
Get optimal CMAKE build arguments for llama-cpp-python.
from inferenceutils import llama_cpp_build_args, get_llama_cpp_install_command
# Get build arguments
args = llama_cpp_build_args()
print(f"CMAKE arguments: {' '.join(args)}")
# Get complete install command
install_cmd = get_llama_cpp_install_command()
print(f"Install command: {install_cmd}")
Pydantic Schemas
HardwareProfile
Complete hardware profile with all detected components.
from inferenceutils import HardwareProfile
# Validate hardware data
try:
profile = HardwareProfile(**hardware_data)
print("✅ Data is valid")
except ValidationError as e:
print(f"❌ Validation failed: {e}")
OptimalInferenceEngine
Inference engine recommendation with dependencies.
from inferenceutils import OptimalInferenceEngine
# Create recommendation
recommendation = OptimalInferenceEngine(
name="MLX",
dependencies=["mlx-lm"],
reason="Optimized for Apple Silicon"
)
🎯 Supported Inference Engines
| Engine | Best For | Dependencies |
|---|---|---|
| TensorRT-LLM | High-end NVIDIA GPUs (Ampere+) | tensorrt-llm |
| vLLM | NVIDIA GPUs (Turing/Volta+) | vllm |
| MLX | Apple Silicon | mlx-lm |
| OpenVINO | Intel GPUs/NPUs | openvino |
| llama.cpp | AMD GPUs, high-performance CPUs | llama-cpp-python |
🔧 Hardware Acceleration Support
GPU Backends
- NVIDIA CUDA: Automatic compute capability detection
- Apple Metal: Native Apple Silicon optimization
- AMD ROCm: HIP acceleration for AMD GPUs
- Intel SYCL: oneAPI for Intel accelerators
- Vulkan: Cross-platform GPU acceleration
CPU Optimizations
- Intel oneMKL: AVX-512 optimization
- OpenBLAS: AVX2 acceleration
- OpenMP: Multi-core parallelism
- KleidiAI: ARM CPU optimization
📋 Example Output
Hardware Detection
{
"os": {
"platform": "Darwin",
"version": "23.0.0",
"architecture": "arm64"
},
"cpu": {
"brand_raw": "Apple M2 Pro",
"physical_cores": 10,
"logical_cores": 10,
"instruction_sets": ["neon"]
},
"ram": {
"total_gb": 32.0,
"available_gb": 24.5
},
"gpu": {
"detected_vendor": "Apple",
"apple": {
"model": "Apple Silicon GPU",
"vram_gb": 32.0,
"metal_supported": true
}
}
}
Engine Recommendation
{
"name": "MLX",
"dependencies": ["mlx-lm"],
"reason": "Natively designed for Apple Silicon. The system's unified memory architecture is best exploited by Apple's own MLX framework, which leverages the CPU, GPU, and Neural Engine."
}
Build Arguments
# Apple Silicon
-DGGML_METAL=ON -DGGML_SVE=OFF -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=Accelerate
# NVIDIA GPU
-DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=89 -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
🚀 Use Cases
Development Setup
from inferenceutils import systeminfo, optimal_inference_engine
# Quick hardware overview
hw = systeminfo()
print(f"Setting up development environment for {hw.cpu.brand_raw}")
# Get recommended engine
engine = optimal_inference_engine()
print(f"Installing {engine.name}...")
CI/CD Pipelines
from inferenceutils import llama_cpp_build_args
# Generate build args for different runners
args = llama_cpp_build_args()
print(f"Building with: {' '.join(args)}")
User Documentation
from inferenceutils import optimal_inference_engine, get_llama_cpp_install_command
# Generate user-specific instructions
engine = optimal_inference_engine()
if engine.name == "llama.cpp":
install_cmd = get_llama_cpp_install_command()
print(f"Run: {install_cmd}")
else:
print(f"Run: pip install {' '.join(engine.dependencies)}")
🔍 Hardware Detection Capabilities
CPU Detection
- Model and architecture
- Core count (physical/logical)
- Instruction sets (AVX-512, AVX2, NEON, AMX)
- Performance characteristics
GPU Detection
- NVIDIA: Model, VRAM, compute capability, driver version
- AMD: Model, VRAM, ROCm compatibility, compute units
- Intel: Model, type (dGPU/iGPU/NPU), execution units
- Apple: Model, unified memory, Metal support, GPU cores
Memory & Storage
- Total and available RAM
- Primary storage type (SSD/HDD)
- Memory bandwidth considerations
NPU Detection
- Apple Neural Engine: Core count, availability
- Intel AI Boost: NPU detection and capabilities
- AMD Ryzen AI: CPU-based detection
🛠️ Dependencies
Core Dependencies
- py-cpuinfo: CPU information
- psutil: System and process utilities
- nvidia-ml-py: NVIDIA GPU monitoring
- openvino: Intel accelerator support
- mlx: Apple Silicon support
- pyobjc: macOS system integration
- vulkan: Vulkan API support
- pydantic: Data validation and serialization
- amdsmi: AMD GPU monitoring (install with
pip install inferenceutils[amd])
📄 License
MIT License - see LICENSE file for details.
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file inferenceutils-0.1.0.tar.gz.
File metadata
- Download URL: inferenceutils-0.1.0.tar.gz
- Upload date:
- Size: 22.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
76e6a420199a2209ac35a76de4ebe70d908289140414ec29696bcd2bea945c78
|
|
| MD5 |
9d18b8c6e2276249694decbaa31ec34f
|
|
| BLAKE2b-256 |
0851e94fd1c8e8c0548ce55d3fe23d7ef2bc5449f7eac9a206494334582e4136
|
File details
Details for the file inferenceutils-0.1.0-py3-none-any.whl.
File metadata
- Download URL: inferenceutils-0.1.0-py3-none-any.whl
- Upload date:
- Size: 21.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
730026fe044062539688ca764400b47c166bc3305191fb43e3b8072e0176c393
|
|
| MD5 |
9809851fef6a2d7a19ec8020ed222a11
|
|
| BLAKE2b-256 |
14eb656f2de8b88a50f264398bc329d0a317e1fe3c1c5a3311036b7d071ab111
|