Fast polyphase resampling with multi-architecture SIMD support

These details have not been verified by PyPI

Project links

Project description

sgnl-cpu-interp

Fast polyphase resampling for multichannel data with multi-architecture SIMD support.

Features

Multi-Architecture SIMD: Automatic runtime CPU detection with optimized kernels for:
- x86_64: AVX-512, AVX2+FMA, AVX, SSE4.1, SSE2
- ARM64: NEON (Apple Silicon, AWS Graviton, etc.)
- Fallback: Optimized scalar implementation
No External Dependencies: Only requires NumPy (removed GSL and FFTW dependencies)
High Performance: ~5x faster than GSL-based implementations
Multichannel: Optimized for processing many channels simultaneously (tested with 1024+ channels)
Quality: Lanczos-windowed sinc interpolation for high-quality upsampling
Two Memory Layouts: Supports both (time, channels) and (channels, time) layouts
Simple API: Easy-to-use NumPy-based interface

Installation

From PyPI (when available)

pip install sgnl-cpu-interp

From source

git clone https://github.com/yourusername/sgnl-cpu-interp.git
cd sgnl-cpu-interp
pip install .

No external dependencies are required beyond NumPy. The build system automatically detects your CPU architecture and compiles the appropriate SIMD kernels.

Quick Start

import numpy as np
from sgnl_cpu_interp import upsample, get_simd_info

# Check which SIMD implementation is being used
print(get_simd_info())
# {'implementation': 'NEON', 'available': ['NEON', 'Scalar'], 'cpu_features': 'NEON+FMA'}

# Upsample a 50 Hz sine wave from 128 Hz to 2048 Hz
fs_in = 128
fs_out = 2048
factor = fs_out // fs_in  # 16x upsampling

# Generate test signal
t = np.arange(0, 0.5, 1/fs_in)
signal = np.sin(2 * np.pi * 50 * t).astype(np.float32)

# Upsample
upsampled = upsample(signal, factor=factor, half_length=8)
print(f"Input: {len(signal)} samples at {fs_in} Hz")
print(f"Output: {len(upsampled)} samples at {fs_out} Hz")

Usage Examples

Single channel upsampling

import numpy as np
from sgnl_cpu_interp import upsample

# 1D signal (single channel)
signal = np.random.randn(1024).astype(np.float32)
upsampled = upsample(signal, factor=2)

Multichannel upsampling

# 2D array: (n_samples, n_channels)
n_samples, n_channels = 1024, 128
data = np.random.randn(n_samples, n_channels).astype(np.float32)

# Upsample by factor of 2
upsampled = upsample(data, factor=2)
print(upsampled.shape)  # (2016, 128) - note: loses 2*half_length samples

# Upsample by factor of 4 with longer kernel for better quality
upsampled = upsample(data, factor=4, half_length=16)
print(upsampled.shape)  # (3972, 128)

Transposed layout

from sgnl_cpu_interp import upsample_transposed

# Transposed layout: (n_channels, n_samples)
data = np.random.randn(128, 1024).astype(np.float32)
upsampled = upsample_transposed(data, factor=2)
print(upsampled.shape)  # (128, 2016)

API Reference

`upsample(data, factor=2, half_length=8)`

Upsample multichannel data using polyphase filtering (standard layout).

Parameters:

data (ndarray): Input array of shape (n_samples,) for single channel or (n_samples, n_channels) for multichannel. Will be converted to float32 if necessary.
factor (int, optional): Upsampling factor (default: 2). Must be >= 2.
half_length (int, optional): Half-length of the sinc kernel (default: 8). Larger values provide better quality but are slower. Total kernel length = 2 * half_length + 1.

Returns:

output (ndarray): Upsampled array of shape ((n_samples - kernel_len + 1) * factor,) or ((n_samples - kernel_len + 1) * factor, n_channels) where kernel_len = 2 * half_length + 1.

`upsample_transposed(data, factor=2, half_length=8)`

Upsample multichannel data using polyphase filtering (transposed layout).

Same as upsample() but expects input in (n_channels, n_samples) layout. Use this when your data is already in channels-first format to avoid transpose overhead.

Parameters:

data (ndarray): Input array of shape (n_channels, n_samples). Must be 2D and float32.
factor (int, optional): Upsampling factor (default: 2). Must be >= 2.
half_length (int, optional): Half-length of the sinc kernel (default: 8).

Returns:

output (ndarray): Upsampled array of shape (n_channels, (n_samples - kernel_len + 1) * factor).

`get_simd_info()`

Get information about the current SIMD implementation.

Returns:

dict with keys:
- implementation: Name of current implementation (e.g., 'AVX2+FMA', 'NEON', 'Scalar')
- available: List of all available implementations for this CPU
- cpu_features: Detected CPU SIMD features

`set_implementation(name)`

Manually select a SIMD implementation. Useful for testing and benchmarking.

Parameters:

name (str): Implementation name from get_simd_info()['available']

Can also be set via the SGNL_CPU_IMPL environment variable:

SGNL_CPU_IMPL=Scalar python my_script.py

Important Notes

Edge loss: The convolution loses kernel_len - 1 samples from the edges. For half_length=8, you lose 16 input samples.
Time alignment: The output has a delay of (kernel_len - 1) / 2 samples at the input sample rate.
Minimum length: Input must have at least kernel_len samples.
Uses Lanczos-windowed sinc kernel: h(x) = sinc(x/factor) * sinc(x/kernel_length)

Performance

Benchmark on 1024 channels, 1024 samples:

Platform	Implementation	Time	Notes
Apple Silicon (M-series)	NEON	~0.4 ms	Auto-selected
Apple Silicon	Scalar	~0.4 ms	Compiler auto-vectorizes well
x86_64 (Haswell+)	AVX2+FMA	~0.3 ms	Expected
x86_64 (older)	SSE2	~0.8 ms	Baseline x86_64

Comparison with previous GSL-based implementation:

Implementation	Time	Speedup
GSL BLAS (old)	9.8 ms	1.0x
This package	0.4 ms	~25x

Architecture

The package automatically detects CPU features at module load time and selects the best available implementation:

┌─────────────────────────────────────────────────────────┐
│                    Python API                           │
│         upsample() / upsample_transposed()              │
└─────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────┐
│                  Runtime Dispatch                        │
│         cpu_detect() → select best implementation        │
└─────────────────────────────────────────────────────────┘
                           │
          ┌────────────────┼────────────────┐
          ▼                ▼                ▼
    ┌──────────┐    ┌──────────┐    ┌──────────┐
    │ AVX-512  │    │   NEON   │    │  Scalar  │
    │  AVX2    │    │  (ARM)   │    │(fallback)│
    │   AVX    │    └──────────┘    └──────────┘
    │  SSE4.1  │
    │  SSE2    │
    │  (x86)   │
    └──────────┘

Development

Building from source

# Install in development mode
pip install -e .

# Run tests
pytest tests/ -v

Project structure

sgnl-cpu-interp/
├── src/
│   ├── cpu_detect.c      # Runtime CPU feature detection
│   ├── dispatch.c        # Function pointer dispatch table
│   ├── resample_ext_simd.c  # Python extension wrapper
│   └── kernels/
│       ├── convolve_scalar.c   # Baseline implementation
│       ├── convolve_sse2.c     # x86 SSE2
│       ├── convolve_sse4.c     # x86 SSE4.1
│       ├── convolve_avx.c      # x86 AVX
│       ├── convolve_avx2.c     # x86 AVX2+FMA
│       ├── convolve_avx512.c   # x86 AVX-512
│       └── convolve_neon.c     # ARM NEON
├── sgnl_cpu_interp.py    # Python API
├── setup.py              # Build configuration
└── tests/
    └── test_simd.py      # Test suite

Adding a new SIMD implementation

Create src/kernels/convolve_<name>.c implementing convolve_<name>() and convolve_transposed_<name>()
Add the implementation to the dispatch table in src/dispatch.c
Add build flags to setup.py in SIMD_FLAGS_UNIX / SIMD_FLAGS_MSVC
Add CPU feature detection if needed in src/cpu_detect.c

Algorithm

This implementation uses polyphase filtering for efficient upsampling:

Kernel generation: Creates a Lanczos-windowed sinc kernel and splits it into factor polyphase components
SIMD convolution: Vectorized dot product across channels (standard layout) or time samples (transposed layout)
Phase-blocked upsampling: Processes all output samples with the same phase together to maximize kernel data reuse in cache

The approach is specifically optimized for:

Many channels (100+)
Small to moderate upsampling factors (2-16x)
Short to medium input lengths (100s to 1000s of samples)

License

MIT License - see LICENSE file for details.

Contributing

Contributions welcome! Please open an issue or pull request on GitHub.

Citation

If you use this in research, please cite:

@software{sgnl_cpu_interp,
  title = {sgnl-cpu-interp: Fast polyphase resampling with multi-architecture SIMD},
  url = {https://github.com/yourusername/sgnl-cpu-interp},
  year = {2025}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Mar 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sgnl_cpu_interp-0.1.0.tar.gz (57.3 kB view details)

Uploaded Mar 7, 2026 Source

File details

Details for the file sgnl_cpu_interp-0.1.0.tar.gz.

File metadata

Download URL: sgnl_cpu_interp-0.1.0.tar.gz
Upload date: Mar 7, 2026
Size: 57.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for sgnl_cpu_interp-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`b2e7dfd76ac1b9d94f8a3a8a630d48eb2f9fa3133e03370278112af35723ce94`
MD5	`2278c8b7d9b013e6798df3846334f09e`
BLAKE2b-256	`e0ba09244a20723eebf6390f7314ca5fe2e10f217c96eb5befa3c49aa20d5152`

See more details on using hashes here.

sgnl-cpu-interp 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

sgnl-cpu-interp

Features

Installation

From PyPI (when available)

From source

Quick Start

Usage Examples

Single channel upsampling

Multichannel upsampling

Transposed layout

API Reference

upsample(data, factor=2, half_length=8)

upsample_transposed(data, factor=2, half_length=8)

get_simd_info()

set_implementation(name)

Important Notes

Performance

Architecture

Development

Building from source

Project structure

Adding a new SIMD implementation

Algorithm

License

Contributing

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

`upsample(data, factor=2, half_length=8)`

`upsample_transposed(data, factor=2, half_length=8)`

`get_simd_info()`

`set_implementation(name)`