Skip to main content

PyTorch bucket-based farthest point sampling (CPU + CUDA).

Project description

PyTorch QuickFPS

Efficient farthest point sampling (FPS) for PyTorch, adapted from fpsample.

This project provides bucket-based FPS on both CPU and GPU. The GPU path is optimized for high-dimensional sampling (e.g., feature embeddings).


Installation

1) Install PyTorch (required)

Install PyTorch using the official instructions for your platform/CUDA:

2) Install torch_quickfps

Option A: prebuilt wheels from pip

# CPU-only
pip install torch-quickfps

# CUDA 12.8
pip install torch-quickfps-cu128

# CUDA 13.0
pip install torch-quickfps-cu130

Notes:

  • The CUDA wheel you choose should match the CUDA-enabled PyTorch you installed (e.g., cu128 wheel with a cu128 PyTorch build).

Option B: install from source (GitHub)

pip install --no-build-isolation git+https://github.com/Astro-85/torch_quickfps

Usage

import torch
import torch_quickfps

x = torch.rand(64, 2048, 256)

# Random sample
sampled_points, indices = torch_quickfps.sample(x, 1024)

# Random sample with specific tree height
sampled_points, indices = torch_quickfps.sample(x, 1024, h=3)

# Random sample with start point index (int)
sampled_points, indices = torch_quickfps.sample(x, 1024, start_idx=0)

# For high-dimensional embeddings on CUDA, set low_d for faster bucketing
sampled_points, indices = torch_quickfps.sample(x, 1024, h=8, low_d=8)

# Indices-only
indices = torch_quickfps.sample(x, 1024, return_points=False)
# (equivalently)
indices = torch_quickfps.sample_idx(x, 1024)

# Masked sampling: only sample from valid points (mask shape [B, N])
mask = torch.ones(x.shape[:-1], dtype=torch.bool)
mask[:, 1000:] = False  # e.g. padding
sampled_points, indices = torch_quickfps.sample(x, 512, mask=mask)

print(sampled_points.size(), indices.size())
# torch.Size([64, 1024, 256]) torch.Size([64, 1024])

Performance comparison

Comparison includes CPU, a vanilla GPU FPS baseline, and our bucketed GPU implementation.

  • N: number of input points
  • D: point dimension
  • K: number of sampled points
  • CPU vs GPU (bucketed): CPU_ms / GPU_bucketed_ms
  • GPU baseline vs bucketed: GPU_baseline_ms / GPU_bucketed_ms
N D K CPU (ms) GPU baseline (ms) GPU bucketed (ms) CPU vs GPU (bucketed) GPU baseline vs bucketed
1000 8 250 0.271 0.404 2.671 0.10x 0.15x
1000 1024 250 69.697 94.144 4.867 14.32x 19.34x
1000 4096 250 248.521 378.458 10.614 23.41x 35.65x
2000 8 500 1.578 1.299 5.432 0.29x 0.24x
2000 1024 500 213.804 399.292 11.018 19.41x 36.24x
2000 4096 500 869.318 1585.913 33.974 25.59x 46.68x
5000 8 1250 6.151 7.156 16.970 0.36x 0.42x
5000 1024 1250 1075.742 2483.299 47.459 22.67x 52.33x
5000 4096 1250 4547.318 10027.665 154.874 29.36x 64.75x
10000 8 2500 22.135 26.152 43.379 0.51x 0.60x
10000 1024 2500 4503.257 9959.041 186.622 24.13x 53.36x
10000 4096 2500 21699.598 40439.047 645.883 33.60x 62.61x

Reference

Bucket-based FPS (QuickFPS) is proposed in the following paper:

@article{han2023quickfps,
  title={QuickFPS: Architecture and Algorithm Co-Design for Farthest Point Sampling in Large-Scale Point Clouds},
  author={Han, Meng and Wang, Liang and Xiao, Limin and Zhang, Hao and Zhang, Chenhao and Xu, Xiangrong and Zhu, Jianfeng},
  journal={IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems},
  year={2023},
  publisher={IEEE}
}

Thanks to the authors for their great work.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

torch_quickfps_cu130-2.1.0-cp310-abi3-manylinux_2_28_x86_64.whl (2.1 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

File details

Details for the file torch_quickfps_cu130-2.1.0-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for torch_quickfps_cu130-2.1.0-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d390148c08543fe0fca560eea7490df1f6d6000d44c5e28af83018e9a43b2465
MD5 3ee0857e2b2ce01040cf6ba3378987d7
BLAKE2b-256 5317ab1d3f00cb6e17237a23ea7c5a741183ae3212608a4d955c380a8916bf67

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page