Skip to main content

๐ŸŒ€ Faster ModernGL Buffer inter process data transfers

Reason this release was yanked:

Broken Windows CI builds

Project description

[!IMPORTANT] Also check out ShaderFlow, where TurboPipe shines! ๐Ÿ˜‰

TurboPipe


Faster ModernGL inter-process data transfers

๐Ÿ”ฅ Description

TurboPipe speeds up sending raw bytes from moderngl.Buffer objects primarily to FFmpeg subprocess

The optimizations involved are:

  • Zero-copy: Avoid unnecessary memory copies or allocation (intermediate buffer.read())
  • C++: The core of TurboPipe is written in C++ for speed, efficiency and low-level control
  • Chunks: Write in chunks of 4096 bytes (RAM page size), so the hardware is happy
  • Threaded:
    • Doesn't block Python code execution, allows to render next frame
    • Decouples the main thread from the I/O thread for performance

โœ… Don't worry, there's proper safety in place. TurboPipe will block Python if a memory address is already queued for writing, and guarantees order of writes per file-descriptor. Just call .sync() when done ๐Ÿ˜‰


๐Ÿ“ฆ Installation

It couldn't be easier! Just install in your package manager:

pip install turbopipe
poetry add turbopipe
pdm add turbopipe
rye add turbopipe

๐Ÿš€ Usage

See also the Examples folder for more controlled usage, and ShaderFlow usage of it!

import subprocess
import moderngl
import turbopipe

# Create ModernGL objects
ctx = moderngl.create_standalone_context()
buffer = ctx.buffer(reserve=1920*1080*3)

# Make sure resolution, pixel format matches!
ffmpeg = subprocess.Popen(
    'ffmpeg -f rawvideo -pix_fmt rgb24 -s 1920x1080 -i - -f null -'.split(),
    stdin=subprocess.PIPE
)

# Rendering loop of yours
for _ in range(100):
    turbopipe.pipe(buffer, ffmpeg.stdin.fileno())

# Finalize writing
turbo.sync()
ffmpeg.stdin.close()
ffmpeg.wait()

โญ๏ธ Benchmarks

[!NOTE] The tests conditions are as follows:

  • The tests are the average of 3 runs to ensure consistency, with 3 GB of the same data being piped
  • The data is a random noise per-buffer between 128-135. So, multi-buffers runs are a noise video
  • All resolutions are wide-screen (16:9) and have 3 components (RGB) with 3 bytes per pixel (SDR)
  • Multi-buffer cycles through a list of buffer (eg. 1, 2, 3, 1, 2, 3... for 3-buffers)
  • All FFmpeg outputs are scrapped with -f null - to avoid any disk I/O bottlenecks
  • The gain column is the percentage increase over the standard method
  • When x264 is Null, no encoding took place (passthrough)
  • The test cases emoji signifies:
    • ๐Ÿข: Standard ffmpeg.stdin.write(buffer.read()) on just the main thread, pure Python
    • ๐Ÿš€: Threaded ffmpeg.stdin.write(buffer.read()) with a queue (similar to turbopipe)
    • ๐ŸŒ€: The magic of turbopipe.pipe(buffer, ffmpeg.stdin.fileno())

Also see benchmark.py for the implementation

โœ… Check out benchmarks in a couple of systems below:

Desktop โ€ข (AMD Ryzen 9 5900x) โ€ข (NVIDIA RTX 3060 12 GB) โ€ข (DDR4 2x32 GB 3200 MT/s) โ€ข (Arch Linux)
720p x264 Buffers Framerate Bandwidth Gain
๐Ÿข Null 1 882 fps 2.44 GB/s
๐Ÿš€ Null 1 793 fps 2.19 GB/s -10.04%
๐ŸŒ€ Null 1 1911 fps 5.28 GB/s 116.70%
๐Ÿข Null 4 818 fps 2.26 GB/s
๐Ÿš€ Null 4 684 fps 1.89 GB/s -16.35%
๐ŸŒ€ Null 4 1494 fps 4.13 GB/s 82.73%
๐Ÿข ultrafast 4 664 fps 1.84 GB/s
๐Ÿš€ ultrafast 4 635 fps 1.76 GB/s -4.33%
๐ŸŒ€ ultrafast 4 869 fps 2.40 GB/s 31.00%
๐Ÿข slow 4 204 fps 0.57 GB/s
๐Ÿš€ slow 4 205 fps 0.57 GB/s 0.58%
๐ŸŒ€ slow 4 208 fps 0.58 GB/s 2.22%
1080p x264 Buffers Framerate Bandwidth Gain
๐Ÿข Null 1 385 fps 2.40 GB/s
๐Ÿš€ Null 1 369 fps 2.30 GB/s -3.91%
๐ŸŒ€ Null 1 641 fps 3.99 GB/s 66.54%
๐Ÿข Null 4 387 fps 2.41 GB/s
๐Ÿš€ Null 4 359 fps 2.23 GB/s -7.21%
๐ŸŒ€ Null 4 632 fps 3.93 GB/s 63.40%
๐Ÿข ultrafast 4 272 fps 1.70 GB/s
๐Ÿš€ ultrafast 4 266 fps 1.66 GB/s -2.14%
๐ŸŒ€ ultrafast 4 405 fps 2.53 GB/s 49.24%
๐Ÿข slow 4 117 fps 0.73 GB/s
๐Ÿš€ slow 4 122 fps 0.76 GB/s 4.43%
๐ŸŒ€ slow 4 124 fps 0.77 GB/s 6.48%
1440p x264 Buffers Framerate Bandwidth Gain
๐Ÿข Null 1 204 fps 2.26 GB/s
๐Ÿš€ Null 1 241 fps 2.67 GB/s 18.49%
๐ŸŒ€ Null 1 297 fps 3.29 GB/s 45.67%
๐Ÿข Null 4 230 fps 2.54 GB/s
๐Ÿš€ Null 4 235 fps 2.61 GB/s 2.52%
๐ŸŒ€ Null 4 411 fps 4.55 GB/s 78.97%
๐Ÿข ultrafast 4 146 fps 1.62 GB/s
๐Ÿš€ ultrafast 4 153 fps 1.70 GB/s 5.21%
๐ŸŒ€ ultrafast 4 216 fps 2.39 GB/s 47.96%
๐Ÿข slow 4 73 fps 0.82 GB/s
๐Ÿš€ slow 4 78 fps 0.86 GB/s 7.06%
๐ŸŒ€ slow 4 79 fps 0.88 GB/s 9.27%
2160p x264 Buffers Framerate Bandwidth Gain
๐Ÿข Null 1 81 fps 2.03 GB/s
๐Ÿš€ Null 1 107 fps 2.67 GB/s 32.26%
๐ŸŒ€ Null 1 213 fps 5.31 GB/s 163.47%
๐Ÿข Null 4 87 fps 2.18 GB/s
๐Ÿš€ Null 4 109 fps 2.72 GB/s 25.43%
๐ŸŒ€ Null 4 212 fps 5.28 GB/s 143.72%
๐Ÿข ultrafast 4 59 fps 1.48 GB/s
๐Ÿš€ ultrafast 4 67 fps 1.68 GB/s 14.46%
๐ŸŒ€ ultrafast 4 95 fps 2.39 GB/s 62.66%
๐Ÿข slow 4 37 fps 0.94 GB/s
๐Ÿš€ slow 4 43 fps 1.07 GB/s 16.22%
๐ŸŒ€ slow 4 44 fps 1.11 GB/s 20.65%
Desktop โ€ข (AMD Ryzen 9 5900x) โ€ข (NVIDIA RTX 3060 12 GB) โ€ข (DDR4 2x32 GB 3200 MT/s) โ€ข (Windows 11)

๐ŸŒ€ Conclusion

TurboPipe significantly increases the feeding speed of FFmpeg with data, especially at higher resolutions. However, if there's few CPU compute available, or the video is too hard to encode (slow preset), the gains are insignificant over the other methods (bottleneck). Multi-buffering didn't prove to have an advantage, debugging shows that TurboPipe C++ is often starved of data to write (as the file stream is buffered on the OS most likely), and the context switching, or cache misses, might be the cause of the slowdown.

Interestingly, due either Linux's scheduler on AMD Ryzen CPUs, or their operating philosophy, it was experimentally seen that Ryzen's frenetic thread switching degrades a bit the single thread performance, which can be "fixed" with prepending the command with taskset --cpu 0,2 (not recommended at all), comparatively speaking to Windows performance on the same system (Linux ๐Ÿš€ = Windows ๐Ÿข). This can also be due the topology of tested CPUs having more than one Core Complex Die (CCD). Intel CPUs seem to stick to the same thread for longer, which makes the Python threaded method an unecessary overhead.

Personal experience

On realistically loads, like ShaderFlow's default lightweight shader export, TurboPipe increases rendering speed from 1080p260 to 1080p330 on my system, with mid 80% CPU usage than low 60%s. For DepthFlow's default depth video export, no gains are seen, as the CPU is almost saturated encoding at 1080p130.


๐Ÿ“š Future work

  • Add support for NumPy arrays, memoryviews, and byte-like objects
  • Improve the thread synchronization and/or use a ThreadPool
  • Maybe use mmap instead of chunks writing
  • Test on MacOS ๐Ÿ™ˆ

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

turbopipe-1.0.1.tar.gz (99.6 kB view details)

Uploaded Source

Built Distributions

turbopipe-1.0.1-cp312-cp312-win_amd64.whl (44.5 kB view details)

Uploaded CPython 3.12 Windows x86-64

turbopipe-1.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.2 kB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

turbopipe-1.0.1-cp312-cp312-macosx_11_0_arm64.whl (30.8 kB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

turbopipe-1.0.1-cp311-cp311-win_amd64.whl (44.5 kB view details)

Uploaded CPython 3.11 Windows x86-64

turbopipe-1.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.2 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

turbopipe-1.0.1-cp311-cp311-macosx_11_0_arm64.whl (30.8 kB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

turbopipe-1.0.1-cp310-cp310-win_amd64.whl (44.5 kB view details)

Uploaded CPython 3.10 Windows x86-64

turbopipe-1.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.2 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

turbopipe-1.0.1-cp310-cp310-macosx_11_0_arm64.whl (30.8 kB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

turbopipe-1.0.1-cp39-cp39-win_amd64.whl (44.5 kB view details)

Uploaded CPython 3.9 Windows x86-64

turbopipe-1.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.2 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

turbopipe-1.0.1-cp39-cp39-macosx_11_0_arm64.whl (30.8 kB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

File details

Details for the file turbopipe-1.0.1.tar.gz.

File metadata

  • Download URL: turbopipe-1.0.1.tar.gz
  • Upload date:
  • Size: 99.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for turbopipe-1.0.1.tar.gz
Algorithm Hash digest
SHA256 53e9e0e77d0d051c67adf63d206c37888d7270e4caf46a0095b68989b43f104b
MD5 f885ae4a08cae2ee4e6641fad9d24ea6
BLAKE2b-256 5a31707c2e9e7ba30decbb311731095e998a3354be487b6e44d0722183da9d10

See more details on using hashes here.

File details

Details for the file turbopipe-1.0.1-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for turbopipe-1.0.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 09e121bff674cac21d80c2233a86cc711b49fa6fcb4c43bde60dceaa87e77545
MD5 b73d3e7e14ed5600316c714256d671ce
BLAKE2b-256 d1999c1756c7056da2fd2e54329d343e637872298b8773738c01b3cd097fe8d5

See more details on using hashes here.

File details

Details for the file turbopipe-1.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for turbopipe-1.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 af952064bba770158008f2fb139c61a6c8371677cb584ee7464c8ee52c98f7b3
MD5 8c69cda9622d98c86f4612a5e10917f2
BLAKE2b-256 78de59d996f5d46ead3b52252042adfd885b0b2656e9fde12e8e04ec271cc3cd

See more details on using hashes here.

File details

Details for the file turbopipe-1.0.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for turbopipe-1.0.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 231144a01ac70dc8312394b03c5f95d3ddc88edb126960c5edc49a3a6b91651c
MD5 0d417c3e221c3ae7d58bfac6729a858f
BLAKE2b-256 c46914676036983d03839176c150944ac602656660e57078bb630d6fb30d6293

See more details on using hashes here.

File details

Details for the file turbopipe-1.0.1-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for turbopipe-1.0.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 cb298ed8bf51fabff838d8dc0cbdab1ace0047a39db3da9cae5fb9f3c2c13d73
MD5 21b19aebbc95b542ee582b0bd8c5b463
BLAKE2b-256 e82c6507587ff3e6b62a31ccaa5bf86e0db53af8e80e81d04cd81e15043c7940

See more details on using hashes here.

File details

Details for the file turbopipe-1.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for turbopipe-1.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 82077b413bddf6ebd7b3b6154b547707fbd34dbc48c22aa26df402439519a3c9
MD5 f31a32bb3c8e65dd8bf94fec1407ecc4
BLAKE2b-256 cda84b2569213f90ab09d941ed56dd97baca740fd393ab09a89bdb12ec26ebcc

See more details on using hashes here.

File details

Details for the file turbopipe-1.0.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for turbopipe-1.0.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1f39ecb7f56f803b6804b484e353a2658c42f3b778863d934102d68853332823
MD5 38b0afa248a7d57cd9e0e18eb844edab
BLAKE2b-256 944c15fb876bb949f3d3fed15c731c1e6ca152d69b834c7fa037f944f27cb473

See more details on using hashes here.

File details

Details for the file turbopipe-1.0.1-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for turbopipe-1.0.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 edc9d49e0f4011d1cbd5e7fccf2660f417c7034caf22036039a47ef3824a4cdb
MD5 9f744935d6f5b27dbeb3ce2d3ab6c1a1
BLAKE2b-256 34130f941b50332377c3c8dd2967a01379f06b9da6dd1032cda4f49c450a3fec

See more details on using hashes here.

File details

Details for the file turbopipe-1.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for turbopipe-1.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a72a02cc1539065e341b2277fcc0956137b9df0134c348fc398c7c942d047399
MD5 bbadd2a20ed038aff9bec8d05c55b9cb
BLAKE2b-256 6813ef1bb68c48897a8a93d8a49be9fcb4b0d7ce02aa352f806b2a9584d40d66

See more details on using hashes here.

File details

Details for the file turbopipe-1.0.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for turbopipe-1.0.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 931a5a19f560e20e0eb8a62ef774a7d20bd6dc7c983ca9549370b0543140882f
MD5 8143bc2aaf25774acc8f3fb3cc9c7bf7
BLAKE2b-256 6cf9ac2022388c4b1a6d0e123c0b66010073fbd0b42e8d9891067f8d39e17f12

See more details on using hashes here.

File details

Details for the file turbopipe-1.0.1-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: turbopipe-1.0.1-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 44.5 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for turbopipe-1.0.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 3b7040d517f71df2471b7a5528944bade824634340084894405c21a41c9ad693
MD5 2e82bcb805fcdaa1cfd455409f5afa0f
BLAKE2b-256 b38d1ce393e8c1b3f3725902cf1b32b728070b07520b58b20330e2adbf2a6afe

See more details on using hashes here.

File details

Details for the file turbopipe-1.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for turbopipe-1.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b84e81b86ababce6a0db779ecf310cfb35f8bbc43388fbce9338349361621c84
MD5 c4b1e9c0c02d55993d7eaf66f253a26b
BLAKE2b-256 00fa2158e6854ff05eea64edcd4c05e73aa71c87c273810c0e5137bca5171684

See more details on using hashes here.

File details

Details for the file turbopipe-1.0.1-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for turbopipe-1.0.1-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 63c36b7b3b464a4b7fdab41f5c9bae24475c54f6c79f1d927600e8cc5250e930
MD5 f87a141b9925ef017359fe358577ad07
BLAKE2b-256 77542003c96923321d104e2dcb0307aa6c77f864f14dd5068b2f21d596d38408

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page