Correlation module for pytorch
Project description
Correlation module
this is a custom C++/Cuda implementation of Correlation module, used e.g. in FlowNetC
This tutorial was used as a basis for implementation, as well as NVIDIA's cuda code
- Build and Install C++ and CUDA extensions by executing
python setup.py install
, - Benchmark C++ vs. CUDA by running
python benchmark.py {cpu, cuda}
, - Run gradient checks on the code by running
python grad_check.py --backend {cpu, cuda}
.
Requirements
This module is expected to compile for Pytorch 0.4.1
, on Python > 3.5
and Python 2.7
.
Installation
this module is available on pip
pip install spatial-correlation-sampler
Usage
API has a few difference with NVIDIA's module
- output is now a 5D tensor, which reflects the shifts horizontal and vertical.
input (B x C x H x W) -> output (B x PatchH x PatchW x oH x oW)
- Output sizes
oH
andoW
are no longer dependant of patch size, but only of kernel size and padding - Patch size
patch_size
is now the whole patch, and not only the radii. stride1
is nowstride
andstride2
isdilation_patch
, which behave like dilated convolutions- equivalent
max_displacement
is thendilation_patch * (patch_size - 1) / 2
. - to get the right parameters for FlowNetC, you would have
kernel_size=1
patch_size=21,
stride=1,
padding=0,
dilation_patch=2
Benchmark
- default parameters are from
benchmark.py
, FlowNetC parameters are same as use inFlowNetC
with a batch size of 4, described in this paper, implemented here and here. - Feel free to file an issue to add entries to this with your hardware !
CUDA Benchmark
- See here for a benchmark script working with NVIDIA's code, and Pytorch
0.3
. - Benchmark are launched with environment variable
CUDA_LAUNCH_BLOCKING
set to1
. - Only
float32
is benchmarked.
implementation | Correlation parameters | device | pass | min time | avg time |
---|---|---|---|---|---|
ours | default | 980 GTX | forward | 5.313 ms | 5.339 ms |
ours | default | 980 GTX | backward | 103.500 ms | 103.685 ms |
NVIDIA | default | 980 GTX | forward | 12.763 ms | 12.844 ms |
NVIDIA | default | 980 GTX | backward | 74.043 ms | 74.323 ms |
ours | FlowNetC | 980 GTX | forward | 5.600 ms | 5.694 ms |
ours | FlowNetC | 980 GTX | backward | 74.719 ms | 75.122 ms |
NVIDIA | FlowNetC | 980 GTX | forward | 8.640 ms | 8.805 ms |
NVIDIA | FlowNetC | 980 GTX | backward | 75.757 ms | 76.873 ms |
Notes
- The large overhead of our implementation regarding
kernel_size
> 1 needs some investigation, feel free to dive in the code to improve it ! - The backward pass of NVIDIA is not entirely correct when stride1 > 1 and kernel_size > 1, because not everything is computed, see here.
CPU Benchmark
- No other implementation is avalaible on CPU.
Correlation parameters | device | pass | min time | avg time |
---|---|---|---|---|
default | E5-2630 v3 @ 2.40GHz | forward | 159.616 ms | 188.727 ms |
default | E5-2630 v3 @ 2.40GHz | backward | 282.641 ms | 294.194 ms |
FlowNetC | E5-2630 v3 @ 2.40GHz | forward | 576.716 ms | 582.069 ms |
FlowNetC | E5-2630 v3 @ 2.40GHz | backward | 1663.429 ms | 1663.429 ms |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Close
Hashes for spatial_correlation_sampler-0.0.8.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | c4f1b72cb47c6160d24971ffcb69a8649c7af0b341b3069455924d3364abc5e2 |
|
MD5 | 3cb51ee53bb8271f027265af761bffdc |
|
BLAKE2b-256 | 745de51bd2126527d7339025febcda13630f1423afc0d9c945806879d242eb80 |