Fast polyphase resampling with multi-architecture SIMD support
Project description
sgnl-cpu-interp
Fast polyphase resampling for multichannel data with multi-architecture SIMD support.
Features
- Multi-Architecture SIMD: Automatic runtime CPU detection with optimized kernels for:
- x86_64: AVX-512, AVX2+FMA, AVX, SSE4.1, SSE2
- ARM64: NEON (Apple Silicon, AWS Graviton, etc.)
- Fallback: Optimized scalar implementation
- No External Dependencies: Only requires NumPy (removed GSL and FFTW dependencies)
- High Performance: ~5x faster than GSL-based implementations
- Multichannel: Optimized for processing many channels simultaneously (tested with 1024+ channels)
- Quality: Lanczos-windowed sinc interpolation for high-quality upsampling
- Two Memory Layouts: Supports both (time, channels) and (channels, time) layouts
- Simple API: Easy-to-use NumPy-based interface
Installation
From PyPI (when available)
pip install sgnl-cpu-interp
From source
git clone https://github.com/yourusername/sgnl-cpu-interp.git
cd sgnl-cpu-interp
pip install .
No external dependencies are required beyond NumPy. The build system automatically detects your CPU architecture and compiles the appropriate SIMD kernels.
Quick Start
import numpy as np
from sgnl_cpu_interp import upsample, get_simd_info
# Check which SIMD implementation is being used
print(get_simd_info())
# {'implementation': 'NEON', 'available': ['NEON', 'Scalar'], 'cpu_features': 'NEON+FMA'}
# Upsample a 50 Hz sine wave from 128 Hz to 2048 Hz
fs_in = 128
fs_out = 2048
factor = fs_out // fs_in # 16x upsampling
# Generate test signal
t = np.arange(0, 0.5, 1/fs_in)
signal = np.sin(2 * np.pi * 50 * t).astype(np.float32)
# Upsample
upsampled = upsample(signal, factor=factor, half_length=8)
print(f"Input: {len(signal)} samples at {fs_in} Hz")
print(f"Output: {len(upsampled)} samples at {fs_out} Hz")
Usage Examples
Single channel upsampling
import numpy as np
from sgnl_cpu_interp import upsample
# 1D signal (single channel)
signal = np.random.randn(1024).astype(np.float32)
upsampled = upsample(signal, factor=2)
Multichannel upsampling
# 2D array: (n_samples, n_channels)
n_samples, n_channels = 1024, 128
data = np.random.randn(n_samples, n_channels).astype(np.float32)
# Upsample by factor of 2
upsampled = upsample(data, factor=2)
print(upsampled.shape) # (2016, 128) - note: loses 2*half_length samples
# Upsample by factor of 4 with longer kernel for better quality
upsampled = upsample(data, factor=4, half_length=16)
print(upsampled.shape) # (3972, 128)
Transposed layout
from sgnl_cpu_interp import upsample_transposed
# Transposed layout: (n_channels, n_samples)
data = np.random.randn(128, 1024).astype(np.float32)
upsampled = upsample_transposed(data, factor=2)
print(upsampled.shape) # (128, 2016)
API Reference
upsample(data, factor=2, half_length=8)
Upsample multichannel data using polyphase filtering (standard layout).
Parameters:
data(ndarray): Input array of shape(n_samples,)for single channel or(n_samples, n_channels)for multichannel. Will be converted to float32 if necessary.factor(int, optional): Upsampling factor (default: 2). Must be >= 2.half_length(int, optional): Half-length of the sinc kernel (default: 8). Larger values provide better quality but are slower. Total kernel length =2 * half_length + 1.
Returns:
output(ndarray): Upsampled array of shape((n_samples - kernel_len + 1) * factor,)or((n_samples - kernel_len + 1) * factor, n_channels)wherekernel_len = 2 * half_length + 1.
upsample_transposed(data, factor=2, half_length=8)
Upsample multichannel data using polyphase filtering (transposed layout).
Same as upsample() but expects input in (n_channels, n_samples) layout. Use this when your data is already in channels-first format to avoid transpose overhead.
Parameters:
data(ndarray): Input array of shape(n_channels, n_samples). Must be 2D and float32.factor(int, optional): Upsampling factor (default: 2). Must be >= 2.half_length(int, optional): Half-length of the sinc kernel (default: 8).
Returns:
output(ndarray): Upsampled array of shape(n_channels, (n_samples - kernel_len + 1) * factor).
get_simd_info()
Get information about the current SIMD implementation.
Returns:
dictwith keys:implementation: Name of current implementation (e.g., 'AVX2+FMA', 'NEON', 'Scalar')available: List of all available implementations for this CPUcpu_features: Detected CPU SIMD features
set_implementation(name)
Manually select a SIMD implementation. Useful for testing and benchmarking.
Parameters:
name(str): Implementation name fromget_simd_info()['available']
Can also be set via the SGNL_CPU_IMPL environment variable:
SGNL_CPU_IMPL=Scalar python my_script.py
Important Notes
- Edge loss: The convolution loses
kernel_len - 1samples from the edges. Forhalf_length=8, you lose 16 input samples. - Time alignment: The output has a delay of
(kernel_len - 1) / 2samples at the input sample rate. - Minimum length: Input must have at least
kernel_lensamples. - Uses Lanczos-windowed sinc kernel:
h(x) = sinc(x/factor) * sinc(x/kernel_length)
Performance
Benchmark on 1024 channels, 1024 samples:
| Platform | Implementation | Time | Notes |
|---|---|---|---|
| Apple Silicon (M-series) | NEON | ~0.4 ms | Auto-selected |
| Apple Silicon | Scalar | ~0.4 ms | Compiler auto-vectorizes well |
| x86_64 (Haswell+) | AVX2+FMA | ~0.3 ms | Expected |
| x86_64 (older) | SSE2 | ~0.8 ms | Baseline x86_64 |
Comparison with previous GSL-based implementation:
| Implementation | Time | Speedup |
|---|---|---|
| GSL BLAS (old) | 9.8 ms | 1.0x |
| This package | 0.4 ms | ~25x |
Architecture
The package automatically detects CPU features at module load time and selects the best available implementation:
┌─────────────────────────────────────────────────────────┐
│ Python API │
│ upsample() / upsample_transposed() │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ Runtime Dispatch │
│ cpu_detect() → select best implementation │
└─────────────────────────────────────────────────────────┘
│
┌────────────────┼────────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ AVX-512 │ │ NEON │ │ Scalar │
│ AVX2 │ │ (ARM) │ │(fallback)│
│ AVX │ └──────────┘ └──────────┘
│ SSE4.1 │
│ SSE2 │
│ (x86) │
└──────────┘
Development
Building from source
# Install in development mode
pip install -e .
# Run tests
pytest tests/ -v
Project structure
sgnl-cpu-interp/
├── src/
│ ├── cpu_detect.c # Runtime CPU feature detection
│ ├── dispatch.c # Function pointer dispatch table
│ ├── resample_ext_simd.c # Python extension wrapper
│ └── kernels/
│ ├── convolve_scalar.c # Baseline implementation
│ ├── convolve_sse2.c # x86 SSE2
│ ├── convolve_sse4.c # x86 SSE4.1
│ ├── convolve_avx.c # x86 AVX
│ ├── convolve_avx2.c # x86 AVX2+FMA
│ ├── convolve_avx512.c # x86 AVX-512
│ └── convolve_neon.c # ARM NEON
├── sgnl_cpu_interp.py # Python API
├── setup.py # Build configuration
└── tests/
└── test_simd.py # Test suite
Adding a new SIMD implementation
- Create
src/kernels/convolve_<name>.cimplementingconvolve_<name>()andconvolve_transposed_<name>() - Add the implementation to the dispatch table in
src/dispatch.c - Add build flags to
setup.pyinSIMD_FLAGS_UNIX/SIMD_FLAGS_MSVC - Add CPU feature detection if needed in
src/cpu_detect.c
Algorithm
This implementation uses polyphase filtering for efficient upsampling:
- Kernel generation: Creates a Lanczos-windowed sinc kernel and splits it into
factorpolyphase components - SIMD convolution: Vectorized dot product across channels (standard layout) or time samples (transposed layout)
- Phase-blocked upsampling: Processes all output samples with the same phase together to maximize kernel data reuse in cache
The approach is specifically optimized for:
- Many channels (100+)
- Small to moderate upsampling factors (2-16x)
- Short to medium input lengths (100s to 1000s of samples)
License
MIT License - see LICENSE file for details.
Contributing
Contributions welcome! Please open an issue or pull request on GitHub.
Citation
If you use this in research, please cite:
@software{sgnl_cpu_interp,
title = {sgnl-cpu-interp: Fast polyphase resampling with multi-architecture SIMD},
url = {https://github.com/yourusername/sgnl-cpu-interp},
year = {2025}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file sgnl_cpu_interp-0.1.0.tar.gz.
File metadata
- Download URL: sgnl_cpu_interp-0.1.0.tar.gz
- Upload date:
- Size: 57.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b2e7dfd76ac1b9d94f8a3a8a630d48eb2f9fa3133e03370278112af35723ce94
|
|
| MD5 |
2278c8b7d9b013e6798df3846334f09e
|
|
| BLAKE2b-256 |
e0ba09244a20723eebf6390f7314ca5fe2e10f217c96eb5befa3c49aa20d5152
|