Skip to main content

Fast polyphase resampling with multi-architecture SIMD support

Project description

sgnl-cpu-interp

Fast polyphase resampling for multichannel data with multi-architecture SIMD support.

Features

  • Multi-Architecture SIMD: Automatic runtime CPU detection with optimized kernels for:
    • x86_64: AVX-512, AVX2+FMA, AVX, SSE4.1, SSE2
    • ARM64: NEON (Apple Silicon, AWS Graviton, etc.)
    • Fallback: Optimized scalar implementation
  • No External Dependencies: Only requires NumPy (removed GSL and FFTW dependencies)
  • High Performance: ~5x faster than GSL-based implementations
  • Multichannel: Optimized for processing many channels simultaneously (tested with 1024+ channels)
  • Quality: Lanczos-windowed sinc interpolation for high-quality upsampling
  • Two Memory Layouts: Supports both (time, channels) and (channels, time) layouts
  • Simple API: Easy-to-use NumPy-based interface

Installation

From PyPI (when available)

pip install sgnl-cpu-interp

From source

git clone https://github.com/yourusername/sgnl-cpu-interp.git
cd sgnl-cpu-interp
pip install .

No external dependencies are required beyond NumPy. The build system automatically detects your CPU architecture and compiles the appropriate SIMD kernels.

Quick Start

import numpy as np
from sgnl_cpu_interp import upsample, get_simd_info

# Check which SIMD implementation is being used
print(get_simd_info())
# {'implementation': 'NEON', 'available': ['NEON', 'Scalar'], 'cpu_features': 'NEON+FMA'}

# Upsample a 50 Hz sine wave from 128 Hz to 2048 Hz
fs_in = 128
fs_out = 2048
factor = fs_out // fs_in  # 16x upsampling

# Generate test signal
t = np.arange(0, 0.5, 1/fs_in)
signal = np.sin(2 * np.pi * 50 * t).astype(np.float32)

# Upsample
upsampled = upsample(signal, factor=factor, half_length=8)
print(f"Input: {len(signal)} samples at {fs_in} Hz")
print(f"Output: {len(upsampled)} samples at {fs_out} Hz")

Usage Examples

Single channel upsampling

import numpy as np
from sgnl_cpu_interp import upsample

# 1D signal (single channel)
signal = np.random.randn(1024).astype(np.float32)
upsampled = upsample(signal, factor=2)

Multichannel upsampling

# 2D array: (n_samples, n_channels)
n_samples, n_channels = 1024, 128
data = np.random.randn(n_samples, n_channels).astype(np.float32)

# Upsample by factor of 2
upsampled = upsample(data, factor=2)
print(upsampled.shape)  # (2016, 128) - note: loses 2*half_length samples

# Upsample by factor of 4 with longer kernel for better quality
upsampled = upsample(data, factor=4, half_length=16)
print(upsampled.shape)  # (3972, 128)

Transposed layout

from sgnl_cpu_interp import upsample_transposed

# Transposed layout: (n_channels, n_samples)
data = np.random.randn(128, 1024).astype(np.float32)
upsampled = upsample_transposed(data, factor=2)
print(upsampled.shape)  # (128, 2016)

API Reference

upsample(data, factor=2, half_length=8)

Upsample multichannel data using polyphase filtering (standard layout).

Parameters:

  • data (ndarray): Input array of shape (n_samples,) for single channel or (n_samples, n_channels) for multichannel. Will be converted to float32 if necessary.
  • factor (int, optional): Upsampling factor (default: 2). Must be >= 2.
  • half_length (int, optional): Half-length of the sinc kernel (default: 8). Larger values provide better quality but are slower. Total kernel length = 2 * half_length + 1.

Returns:

  • output (ndarray): Upsampled array of shape ((n_samples - kernel_len + 1) * factor,) or ((n_samples - kernel_len + 1) * factor, n_channels) where kernel_len = 2 * half_length + 1.

upsample_transposed(data, factor=2, half_length=8)

Upsample multichannel data using polyphase filtering (transposed layout).

Same as upsample() but expects input in (n_channels, n_samples) layout. Use this when your data is already in channels-first format to avoid transpose overhead.

Parameters:

  • data (ndarray): Input array of shape (n_channels, n_samples). Must be 2D and float32.
  • factor (int, optional): Upsampling factor (default: 2). Must be >= 2.
  • half_length (int, optional): Half-length of the sinc kernel (default: 8).

Returns:

  • output (ndarray): Upsampled array of shape (n_channels, (n_samples - kernel_len + 1) * factor).

get_simd_info()

Get information about the current SIMD implementation.

Returns:

  • dict with keys:
    • implementation: Name of current implementation (e.g., 'AVX2+FMA', 'NEON', 'Scalar')
    • available: List of all available implementations for this CPU
    • cpu_features: Detected CPU SIMD features

set_implementation(name)

Manually select a SIMD implementation. Useful for testing and benchmarking.

Parameters:

  • name (str): Implementation name from get_simd_info()['available']

Can also be set via the SGNL_CPU_IMPL environment variable:

SGNL_CPU_IMPL=Scalar python my_script.py

Important Notes

  • Edge loss: The convolution loses kernel_len - 1 samples from the edges. For half_length=8, you lose 16 input samples.
  • Time alignment: The output has a delay of (kernel_len - 1) / 2 samples at the input sample rate.
  • Minimum length: Input must have at least kernel_len samples.
  • Uses Lanczos-windowed sinc kernel: h(x) = sinc(x/factor) * sinc(x/kernel_length)

Performance

Benchmark on 1024 channels, 1024 samples:

Platform Implementation Time Notes
Apple Silicon (M-series) NEON ~0.4 ms Auto-selected
Apple Silicon Scalar ~0.4 ms Compiler auto-vectorizes well
x86_64 (Haswell+) AVX2+FMA ~0.3 ms Expected
x86_64 (older) SSE2 ~0.8 ms Baseline x86_64

Comparison with previous GSL-based implementation:

Implementation Time Speedup
GSL BLAS (old) 9.8 ms 1.0x
This package 0.4 ms ~25x

Architecture

The package automatically detects CPU features at module load time and selects the best available implementation:

┌─────────────────────────────────────────────────────────┐
│                    Python API                           │
│         upsample() / upsample_transposed()              │
└─────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────┐
│                  Runtime Dispatch                        │
│         cpu_detect() → select best implementation        │
└─────────────────────────────────────────────────────────┘
                           │
          ┌────────────────┼────────────────┐
          ▼                ▼                ▼
    ┌──────────┐    ┌──────────┐    ┌──────────┐
    │ AVX-512  │    │   NEON   │    │  Scalar  │
    │  AVX2    │    │  (ARM)   │    │(fallback)│
    │   AVX    │    └──────────┘    └──────────┘
    │  SSE4.1  │
    │  SSE2    │
    │  (x86)   │
    └──────────┘

Development

Building from source

# Install in development mode
pip install -e .

# Run tests
pytest tests/ -v

Project structure

sgnl-cpu-interp/
├── src/
│   ├── cpu_detect.c      # Runtime CPU feature detection
│   ├── dispatch.c        # Function pointer dispatch table
│   ├── resample_ext_simd.c  # Python extension wrapper
│   └── kernels/
│       ├── convolve_scalar.c   # Baseline implementation
│       ├── convolve_sse2.c     # x86 SSE2
│       ├── convolve_sse4.c     # x86 SSE4.1
│       ├── convolve_avx.c      # x86 AVX
│       ├── convolve_avx2.c     # x86 AVX2+FMA
│       ├── convolve_avx512.c   # x86 AVX-512
│       └── convolve_neon.c     # ARM NEON
├── sgnl_cpu_interp.py    # Python API
├── setup.py              # Build configuration
└── tests/
    └── test_simd.py      # Test suite

Adding a new SIMD implementation

  1. Create src/kernels/convolve_<name>.c implementing convolve_<name>() and convolve_transposed_<name>()
  2. Add the implementation to the dispatch table in src/dispatch.c
  3. Add build flags to setup.py in SIMD_FLAGS_UNIX / SIMD_FLAGS_MSVC
  4. Add CPU feature detection if needed in src/cpu_detect.c

Algorithm

This implementation uses polyphase filtering for efficient upsampling:

  1. Kernel generation: Creates a Lanczos-windowed sinc kernel and splits it into factor polyphase components
  2. SIMD convolution: Vectorized dot product across channels (standard layout) or time samples (transposed layout)
  3. Phase-blocked upsampling: Processes all output samples with the same phase together to maximize kernel data reuse in cache

The approach is specifically optimized for:

  • Many channels (100+)
  • Small to moderate upsampling factors (2-16x)
  • Short to medium input lengths (100s to 1000s of samples)

License

MIT License - see LICENSE file for details.

Contributing

Contributions welcome! Please open an issue or pull request on GitHub.

Citation

If you use this in research, please cite:

@software{sgnl_cpu_interp,
  title = {sgnl-cpu-interp: Fast polyphase resampling with multi-architecture SIMD},
  url = {https://github.com/yourusername/sgnl-cpu-interp},
  year = {2025}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sgnl_cpu_interp-0.1.0.tar.gz (57.3 kB view details)

Uploaded Source

File details

Details for the file sgnl_cpu_interp-0.1.0.tar.gz.

File metadata

  • Download URL: sgnl_cpu_interp-0.1.0.tar.gz
  • Upload date:
  • Size: 57.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for sgnl_cpu_interp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b2e7dfd76ac1b9d94f8a3a8a630d48eb2f9fa3133e03370278112af35723ce94
MD5 2278c8b7d9b013e6798df3846334f09e
BLAKE2b-256 e0ba09244a20723eebf6390f7314ca5fe2e10f217c96eb5befa3c49aa20d5152

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page