PyTorch bucket-based farthest point sampling (CPU + CUDA).
Project description
PyTorch QuickFPS
Efficient farthest point sampling (FPS) for PyTorch, adapted from fpsample.
This project provides bucket-based FPS on both CPU and GPU. The GPU path is optimized for high-dimensional sampling (e.g., feature embeddings).
Installation
1) Install PyTorch (required)
Install PyTorch using the official instructions for your platform/CUDA:
2) Install torch_quickfps
Option A: prebuilt wheels from pip
# CPU-only
pip install torch-quickfps
# CUDA 12.8
pip install torch-quickfps-cu128
# CUDA 13.0
pip install torch-quickfps-cu130
Notes:
- The CUDA wheel you choose should match the CUDA-enabled PyTorch you installed (e.g., cu128 wheel with a cu128 PyTorch build).
Option B: install from source (GitHub)
pip install --no-build-isolation git+https://github.com/Astro-85/torch_quickfps
Usage
import torch
import torch_quickfps
x = torch.rand(64, 2048, 256)
# Random sample
sampled_points, indices = torch_quickfps.sample(x, 1024)
# Random sample with specific tree height
sampled_points, indices = torch_quickfps.sample(x, 1024, h=3)
# Random sample with start point index (int)
sampled_points, indices = torch_quickfps.sample(x, 1024, start_idx=0)
# For high-dimensional embeddings on CUDA, set low_d for faster bucketing
sampled_points, indices = torch_quickfps.sample(x, 1024, h=8, low_d=8)
# Indices-only
indices = torch_quickfps.sample(x, 1024, return_points=False)
# (equivalently)
indices = torch_quickfps.sample_idx(x, 1024)
# Masked sampling: only sample from valid points (mask shape [B, N])
mask = torch.ones(x.shape[:-1], dtype=torch.bool)
mask[:, 1000:] = False # e.g. padding
sampled_points, indices = torch_quickfps.sample(x, 512, mask=mask)
print(sampled_points.size(), indices.size())
# torch.Size([64, 1024, 256]) torch.Size([64, 1024])
Performance comparison
Comparison includes CPU, a vanilla GPU FPS baseline, and our bucketed GPU implementation.
- N: number of input points
- D: point dimension
- K: number of sampled points
- CPU vs GPU (bucketed):
CPU_ms / GPU_bucketed_ms - GPU baseline vs bucketed:
GPU_baseline_ms / GPU_bucketed_ms
| N | D | K | CPU (ms) | GPU baseline (ms) | GPU bucketed (ms) | CPU vs GPU (bucketed) | GPU baseline vs bucketed |
|---|---|---|---|---|---|---|---|
| 1000 | 8 | 250 | 0.271 | 0.404 | 2.671 | 0.10x | 0.15x |
| 1000 | 1024 | 250 | 69.697 | 94.144 | 4.867 | 14.32x | 19.34x |
| 1000 | 4096 | 250 | 248.521 | 378.458 | 10.614 | 23.41x | 35.65x |
| 2000 | 8 | 500 | 1.578 | 1.299 | 5.432 | 0.29x | 0.24x |
| 2000 | 1024 | 500 | 213.804 | 399.292 | 11.018 | 19.41x | 36.24x |
| 2000 | 4096 | 500 | 869.318 | 1585.913 | 33.974 | 25.59x | 46.68x |
| 5000 | 8 | 1250 | 6.151 | 7.156 | 16.970 | 0.36x | 0.42x |
| 5000 | 1024 | 1250 | 1075.742 | 2483.299 | 47.459 | 22.67x | 52.33x |
| 5000 | 4096 | 1250 | 4547.318 | 10027.665 | 154.874 | 29.36x | 64.75x |
| 10000 | 8 | 2500 | 22.135 | 26.152 | 43.379 | 0.51x | 0.60x |
| 10000 | 1024 | 2500 | 4503.257 | 9959.041 | 186.622 | 24.13x | 53.36x |
| 10000 | 4096 | 2500 | 21699.598 | 40439.047 | 645.883 | 33.60x | 62.61x |
Reference
Bucket-based FPS (QuickFPS) is proposed in the following paper:
@article{han2023quickfps,
title={QuickFPS: Architecture and Algorithm Co-Design for Farthest Point Sampling in Large-Scale Point Clouds},
author={Han, Meng and Wang, Liang and Xiao, Limin and Zhang, Hao and Zhang, Chenhao and Xu, Xiangrong and Zhu, Jianfeng},
journal={IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems},
year={2023},
publisher={IEEE}
}
Thanks to the authors for their great work.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file torch_quickfps-2.1.0-cp310-abi3-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: torch_quickfps-2.1.0-cp310-abi3-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 1.1 MB
- Tags: CPython 3.10+, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b8fe90157309c3b508e2af76a655e2b3d50bd8a599cc1bdf019adeadcc1317c
|
|
| MD5 |
1d49e25b856970c97eb00cdfac4efe8f
|
|
| BLAKE2b-256 |
b794d1bb46f593be1610bff56b5f2a6c51175866c92a52b2b5f3e78d0eaea0df
|
File details
Details for the file torch_quickfps-2.1.0-cp310-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: torch_quickfps-2.1.0-cp310-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 61.5 kB
- Tags: CPython 3.10+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
193b8f13815cf0e7100a927bd9dd94178cd75fc74d7258dbfab1035f36d6ae8c
|
|
| MD5 |
90802877f7de31bba5c37ef71879ea72
|
|
| BLAKE2b-256 |
3c47478946465b1e7436d7601717238e93ba18352603a08917d3e140f685417d
|