Skip to main content

High-performance Python acceleration engine — CPU, threads, virtual threads, multi-GPU, NPU, ARM/Android/Termux, IoT/SBC and virtualization.

Project description

PyAccelerate

High-performance Python acceleration engine — CPU, threads, virtual threads, multi-GPU, NPU, ARM/Android/Termux, OS priority, energy profiles and maximum optimization mode.

CI Python 3.10+ License: MIT


Features

Module Description
cpu CPU detection, topology, NUMA, affinity, ISA flags, ARM big.LITTLE/DynamIQ, dynamic worker recommendations
threads Persistent virtual-thread pool, sliding-window executor, async bridge, process pool
gpu Multi-vendor GPU detection (NVIDIA/CUDA, AMD/OpenCL, Intel oneAPI, ARM Adreno/Mali/Immortalis), ranking, multi-GPU dispatch
npu NPU detection & inference (OpenVINO, ONNX Runtime, DirectML, CoreML, ARM Hexagon/Samsung NPU/Tensor TPU/MediaTek APU)
virt Virtualization detection (Hyper-V, VT-x/AMD-V, KVM, WSL2, Docker, container detection)
memory Memory pressure monitoring, automatic worker clamping, reusable buffer pool
profiler @timed, @profile_memory decorators, Timer context manager, Tracker statistics
benchmark Built-in micro-benchmarks (CPU, threads, memory bandwidth, GPU compute)
priority OS-level task priority (IDLE → REALTIME) & energy profiles (POWER_SAVER → ULTRA_PERFORMANCE)
max_mode Maximum optimization mode — activates ALL resources simultaneously with OS tuning
android Android/Termux platform detection, ARM SoC database (25+ chipsets), big.LITTLE, thermal & battery
engine Unified orchestrator — auto-detects everything and provides a single API

Quick Start

pip install pyaccelerate
from pyaccelerate import Engine

engine = Engine()
print(engine.summary())

# Submit I/O-bound tasks to the virtual thread pool
future = engine.submit(my_io_func, arg1, arg2)

# Run many tasks with auto-tuned concurrency
engine.run_parallel(process_file, [(f,) for f in files])

# GPU dispatch (auto-fallback to CPU)
results = engine.gpu_dispatch(my_kernel, data_chunks)

Maximum Optimization Mode

Activates all available hardware resources in parallel with OS-level tuning:

from pyaccelerate.max_mode import MaxMode

with MaxMode() as m:
    print(m.summary())  # hardware manifest

    # Run CPU + I/O simultaneously
    results = m.run_all(
        cpu_fn=cpu_heavy_task, cpu_items=cpu_data,
        io_fn=io_heavy_task, io_items=io_data,
    )

    # I/O only (thread pool)
    downloaded = m.run_io(download, [(url,) for url in urls])

    # CPU only (process pool)
    computed = m.run_cpu(crunch, [(n,) for n in numbers])

    # Multi-stage pipeline
    results = m.run_pipeline([
        ("download", download_fn, urls),
        ("transform", transform_fn, data),
        ("save", save_fn, output),
    ])

Or via the Engine:

engine = Engine()
with engine.max_mode() as m:
    results = m.run_all(...)

OS Priority & Energy Management

Control process scheduling and power profiles across Windows, Linux & macOS:

from pyaccelerate.priority import (
    TaskPriority, EnergyProfile,
    set_task_priority, set_energy_profile,
    max_performance, balanced, power_saver,
)

# Quick presets
max_performance()   # HIGH priority + ULTRA_PERFORMANCE energy
balanced()          # Restore defaults
power_saver()       # BELOW_NORMAL + POWER_SAVER

# Fine-grained control
set_task_priority(TaskPriority.ABOVE_NORMAL)
set_energy_profile(EnergyProfile.PERFORMANCE)

CLI

pyaccelerate info          # Full hardware report
pyaccelerate benchmark     # Run micro-benchmarks
pyaccelerate gpu           # GPU details
pyaccelerate cpu           # CPU details
pyaccelerate npu           # NPU details
pyaccelerate android       # ARM/Android device details (SoC, clusters, thermal)
pyaccelerate virt          # Virtualization info
pyaccelerate memory        # Memory stats
pyaccelerate status        # One-liner
pyaccelerate priority      # Show current priority/energy
pyaccelerate priority --preset max     # Apply max performance preset
pyaccelerate priority --set high       # Set task priority
pyaccelerate priority --energy performance  # Set energy profile
pyaccelerate max-mode      # Show max-mode hardware manifest

ARM / Android / Termux Support

Full hardware detection for ARM devices — phones (Termux, Pydroid), tablets, Raspberry Pi, ARM laptops (Snapdragon X Elite), and ARM servers:

from pyaccelerate.android import (
    is_android, is_termux, is_arm,
    get_device_info, get_soc_info,
    detect_big_little, get_arm_features,
    get_thermal_zones, get_battery_info,
)

if is_arm():
    soc = get_soc_info()
    if soc:
        print(f"{soc.name} ({soc.vendor})")   # Snapdragon 8 Gen 3 (Qualcomm)
        print(f"GPU: {soc.gpu_name}")           # Adreno 750
        print(f"NPU: {soc.npu_name} ({soc.npu_tops} TOPS)")  # Hexagon NPU (73.0 TOPS)

    clusters = detect_big_little()
    # {"Cortex-X4": [0], "Cortex-A720": [1,2,3], "Cortex-A520": [4,5,6,7]}

    features = get_arm_features()
    # ["aes", "asimd", "bf16", "crc32", "neon", "sve", "sve2", ...]

Supported SoC families (25+ chipsets in database):

  • Qualcomm — Snapdragon 8 Elite, 8/7/6 Gen 1-3, 888, 865, X Elite
  • Samsung — Exynos 2500, 2200, 2100, 1380, 990
  • Google — Tensor G1–G4
  • MediaTek — Dimensity 9300, 9200, 9000, 8300, 1200, 1100, 900
  • HiSilicon — Kirin 9010, 9000
  • Unisoc — T616

ARM GPU detection — Adreno, Mali, Immortalis, Xclipse, PowerVR, Maleoon (via SoC DB, sysfs, Vulkan, OpenCL)

ARM NPU detection — Hexagon, Samsung NPU, Google TPU, MediaTek APU, Da Vinci NPU (via SoC DB, NNAPI, TFLite)

Modules in Depth

Virtual Thread Pool

Inspired by Java's virtual threads — a persistent ThreadPoolExecutor sized for I/O (cores × 3, cap 32). All I/O-bound work shares this pool instead of creating/destroying threads per operation.

from pyaccelerate.threads import get_pool, run_parallel, submit

# Single task
fut = submit(download_file, url)

# Bounded concurrency (sliding window)
run_parallel(process, [(item,) for item in items], max_concurrent=8)

Multi-GPU Dispatch

Auto-detects GPUs across CUDA, OpenCL and Intel oneAPI. Distributes workloads with configurable strategies.

from pyaccelerate.gpu import detect_all, dispatch

gpus = detect_all()
results = dispatch(my_kernel, data_chunks, strategy="score-weighted")

Profiling

Zero-config decorators for timing and memory tracking:

from pyaccelerate.profiler import timed, profile_memory, Tracker

@timed(level=logging.INFO)
def heavy_computation():
    ...

tracker = Tracker("db_queries")
for batch in batches:
    with tracker.measure():
        run_query(batch)
print(tracker.summary())

Installation Options

# Core (CPU + threads + memory + virt)
pip install pyaccelerate

# With NVIDIA GPU support
pip install pyaccelerate[cuda]

# With OpenCL support (AMD/Intel/NVIDIA)
pip install pyaccelerate[opencl]

# With Intel oneAPI support
pip install pyaccelerate[intel]

# All GPU backends
pip install pyaccelerate[all-gpu]

# Development
pip install pyaccelerate[dev]

Docker

# CPU-only
docker build -t pyaccelerate .
docker run --rm pyaccelerate info

# With NVIDIA GPU
docker build -f Dockerfile.gpu -t pyaccelerate:gpu .
docker run --rm --gpus all pyaccelerate:gpu info

# Docker Compose
docker compose up pyaccelerate    # CPU
docker compose up gpu             # GPU

Development

git clone https://github.com/GuilhermeP96/pyaccelerate.git
cd pyaccelerate
pip install -e ".[dev]"

# Run tests
pytest -v

# Lint + format
ruff check src/ tests/
ruff format src/ tests/

# Type check
mypy src/

# Build wheel
python -m build

Architecture

pyaccelerate/
├── cpu.py          # CPU detection & topology
├── threads.py      # Virtual thread pool & executors
├── gpu/
│   ├── detector.py # Multi-vendor GPU enumeration
│   ├── cuda.py     # CUDA/CuPy helpers
│   ├── opencl.py   # PyOpenCL helpers
│   ├── intel.py    # Intel oneAPI helpers
│   └── dispatch.py # Multi-GPU load balancer
├── npu/
│   ├── detector.py # NPU detection (Intel, Qualcomm, Apple)
│   ├── onnx_rt.py  # ONNX Runtime inference
│   ├── openvino.py # OpenVINO inference
│   └── inference.py# Unified inference API
├── virt.py         # Virtualization detection
├── memory.py       # Memory monitoring & buffer pool
├── profiler.py     # Timing & profiling utilities
├── benchmark.py    # Built-in micro-benchmarks
├── priority.py     # OS task priority & energy profiles
├── max_mode.py     # Maximum optimization mode
├── engine.py       # Unified orchestrator
└── cli.py          # Command-line interface

Examples

The examples/ directory contains runnable scripts demonstrating all features:

Example Description
example_basic.py Engine creation, summary, submit, run_parallel, batch
example_parallel_io.py Parallel download/process/write with public UCI ML datasets
example_cpu_bound.py Sequential vs thread pool vs process pool comparison
example_max_mode.py MaxMode context manager, run_all, run_io, run_cpu, pipeline
example_pipeline.py Multi-stage data pipeline (download → analyze → report)
example_priority.py TaskPriority levels, EnergyProfile, presets, benchmarking
cd examples
python example_basic.py
python example_max_mode.py
python example_priority.py

Roadmap

  • npm package (Node.js bindings via pybind11/napi)
  • gRPC server mode for multi-language integration
  • Kubernetes operator for auto-scaling GPU workloads
  • Prometheus metrics exporter
  • Auto-tuning feedback loop (benchmark → config → re-tune)

Origin

Evolved from the acceleration & virtual-thread systems built for:

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyaccelerate-0.5.0.tar.gz (97.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyaccelerate-0.5.0-py3-none-any.whl (88.3 kB view details)

Uploaded Python 3

File details

Details for the file pyaccelerate-0.5.0.tar.gz.

File metadata

  • Download URL: pyaccelerate-0.5.0.tar.gz
  • Upload date:
  • Size: 97.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyaccelerate-0.5.0.tar.gz
Algorithm Hash digest
SHA256 71694489e9b8846c279026195d315c77fd6e1a4ad499c1f4f27c40b04e6a3cf6
MD5 77fc222828da369365e9ff6001cdfe0f
BLAKE2b-256 b81a7865aed54eb5985d54575d7c50884ba2bbdc203b7bfe625805e393194f31

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyaccelerate-0.5.0.tar.gz:

Publisher: publish.yml on GuilhermeP96/pyaccelerate

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyaccelerate-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: pyaccelerate-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 88.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyaccelerate-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 96f1b9f3f3cc673e951523484fd0801f60860e861dffb224772d2390938b9616
MD5 1f0eba249bb095ccc196ff3df43277da
BLAKE2b-256 f1c5635895a0b1a0735c6167331a432bb174cd39f1b4a3fee0a614fe0de9a37f

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyaccelerate-0.5.0-py3-none-any.whl:

Publisher: publish.yml on GuilhermeP96/pyaccelerate

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page