Skip to main content

Rust-backed Windows DXGI Desktop Duplication API screen capture for Python.

Project description

rustcam

PyPI Downloads GitHub stars

Fast DXGI Desktop Duplication screen capture for Windows, in Rust.

I made this because every "fast" screen capture package on PyPI runs its hot loop in Python. bettercam is a fork of dxcam, dxcam calls AcquireNextFrame through comtypes on every frame under the GIL, and the GDI-based ones (mss, PIL.ImageGrab) aren't even using DDA. They all top out around 130-140 fps on a 180 Hz monitor for the same reason: per-frame Python overhead misses compositor ticks. rustcam runs the whole AcquireNextFrame -> CopyResource -> Map -> memcpy cycle in native Rust with the GIL released, so it actually rides the refresh rate.

import rustcam

cap = rustcam.Capturer(output=0, cursor=True)
frame = cap.grab()        # numpy ndarray (H, W, 4) BGRA, or None on timeout

Prebuilt Windows wheels for Python 3.9 through 3.13 (a single abi3 wheel that covers them all). pip install rustcam never compiles anything.

Install

pip install rustcam

Windows only. DDA is the IDXGIOutputDuplication interface, which is Win8+. There is no Linux or macOS equivalent. If you need cross-platform capture, look at mss (slower, GDI-based).

Performance

benchmark

benches/flip_demo_bench.py runs each library against two stimuli, on a 1920x1080 / 180 Hz monitor backed by a GTX 1660 Ti:

  • flip_demo is a native Rust D3D11 app from the zentape project. It presents a flip-model swapchain with a unique full-screen colour every refresh, so the source emits ~180 unique fps and any honest capturer should be able to read at panel rate. This is the controlled benchmark.
  • mover.py is a borderless PyQt window that orbits across the screen continuously. This is the realistic benchmark, what users actually capture, dragging a window around or recording a moving UI.

The metric is unique frames per second, measured by md5-hashing a sparse sample of each returned frame and counting distinct hashes. Container fps lies; a library can return the same buffer over and over and look fast. Unique fps cant be faked.

There are two ways to invoke bettercam and dxcam, and they perform very differently:

  • .grab() mode: each Python call goes straight to AcquireNextFrame. This is the SHOWY mode, what the lib's README screenshot uses. Fast on a controlled source. Almost nobody writes their actual code this way because you have to handle the polling yourself.
  • .start() + .get_latest_frame() mode: the typical pattern. The lib spawns a bg thread that captures into a ring buffer; you pull from the buffer. This is what every bettercam / dxcam tutorial uses, what real recording code uses, and what shows up in their advertised "FPS" number. It's also significantly slower than the showy mode because the bg thread is a Python loop.

The table shows both, in valid fps (non-None returns per second). %changed annotates how many consecutive returned frames actually differed (the freshness check):

capturer flip_demo (valid fps) mover.py (valid fps) mover.py (% changed)
rustcam grab(cursor=False) ~180 ~150 99 %
rustcam grab(cursor=True) ~165 ~70 99 %
rustcam start/get_latest_frame ~170 (varies) 100 %
rustcam grab_gpu (sustained) ~180 ~280 calls/s, GPU-resident 100 %
bettercam .grab() ~180 ~50 51 %
bettercam .start()/.get_latest_frame() ~125 ~60 60 %
dxcam .grab() ~180 ~65 39 %
dxcam .start()/.get_latest_frame() ~155 ~3 93 %
mss ~8 ~50 21 %

On the controlled flip-model source (flip_demo is a tiny native Rust D3D11 app that presents a unique full-screen colour every refresh), in .grab() mode all three DDA-based libraries hit the panel rate. In the typical .start()-mode usage, bettercam falls to ~125, dxcam to ~155, while rustcam stays at ~170. Their bg threads are Python loops; rustcam's is native.

The moment the stimulus is realistic content (mover.py is a borderless PyQt window that orbits across the screen), the gap blows wide open:

  • rustcam delivers ~150 valid fps, 99% of consecutive frames are different
  • bettercam in .start() mode delivers ~60, but only 60% of those are fresh, so effective ~36 fps
  • dxcam in .start() mode delivers ~3 fps, basically broken on this stimulus
  • mss delivers ~50, only 21% fresh

bettercam still advertises "fastest in the world" in its banner; the flip_demo .grab() number agrees, but in the real .start()-mode it's ~125 fps on the controlled source and ~60 (with 40% of those duplicates) on realistic content.

The cursor=True cell on mover.py is currently ~70 fps. The GDI cursor compositing path (DrawIconEx on the GDI-compatible BGRA texture) syncs with the GPU every frame, and that sync costs more when the desktop is also recomposing under it. Use cursor=False for the fastest hot path; turn cursor on when you actually need it composited.

Why this is faster

Every existing PyPI screen-cap library does the DDA loop FROM PYTHON. They acquire each frame through comtypes proxies, allocate a numpy array per call, do format conversion through cv2.cvtColor (bettercam pulls OpenCV in just for that), and hold the GIL the whole time. The native rate the OS can give you (one frame per compositor tick) gets eaten by all of that.

rustcam does the entire AcquireNextFrame -> CopyResource -> Map -> RowPitch-aware memcpy in a single Rust function call, releases the GIL around it, and reuses the same BGRA + staging textures across calls. Format conversion (BGR / RGB / RGBA / grayscale) is a tight scalar Rust loop that LLVM auto-vectorizes, no OpenCV dependency. Theres nothing clever, its just doing the same DXGI calls bettercam does without the per-frame Python overhead.

Additions vs bettercam:

  • proper cursor compositing via IDXGISurface1::GetDC + DrawIconEx(DI_NORMAL), which handles the inverting I-beam over text correctly (DrawIconEx does mask + XOR blending natively)
  • a region argument that crops on the way out of the staging-texture map (no extra alloc)
  • five output formats (bgra/bgr/rgba/rgb/gray) with no cv2 dependency
  • a paced CFR frames(fps=N) iterator that yields (ndarray, slot_wallclock_ts) for video recording, slot-clock pacing in native Rust, no Python-side timer drift
  • a start()/stop()/get_latest_frame() background-thread mode (bettercam-parity API), Rust capture loop, mailbox, blocking pull
  • grab_gpu() returns a GpuTexture wrapping a shared NT handle around a BGRA D3D11 texture, zero CPU readback, downstream code (CUDA, Vulkan, custom D3D11) can open the handle on its own device
  • a context manager so with rustcam.Capturer(...) as cap: releases COM state on exit
  • structured exceptions (AccessLost, DeviceError, DuplicationError, CaptureTimeout, CaptureError) carrying the underlying HRESULT

API

import rustcam

cap = rustcam.Capturer(
    output=0,            # IDXGIOutput index, 0 = primary on single-GPU systems
    cursor=True,         # composite the OS cursor into each captured frame
    region=None,         # persistent (l, t, r, b) crop; None = full output
    device=0,            # IDXGIAdapter index, 0 = first adapter
)

# state
cap.width, cap.height            # output resolution
cap.region                       # current persistent region (full if None)
cap.output_idx, cap.device_idx
cap.cursor, cap.format, cap.rotation
cap.is_capturing                 # True between start() and stop(), or during frames() iteration

# one-shot capture
frame = cap.grab(
    timeout_ms=1000,                  # wait up to this long; 0 = poll
    fmt="bgra",                       # bgra / bgr / rgba / rgb / gray
    region=None,                      # per-call crop, doesn't mutate cap.region
)
# returns numpy ndarray (H, W, C) uint8, or None on DXGI_ERROR_WAIT_TIMEOUT

# background capture (bettercam-parity)
cap.start(target_fps=60, video_mode=True)
frame = cap.get_latest_frame(timeout_ms=500)   # blocks until new frame; raises CaptureTimeout on deadline
cap.stop()

# paced CFR stream — yields (ndarray, slot_wallclock_seconds), exact 1/fps spacing
for frame, ts in cap.frames(fps=60, fmt="bgr"):
    encoder.write(frame, pts=ts)
    if ts > 10.0:
        break

# zero-copy GPU handle (consumer opens with OpenSharedResource1 on its own D3D11 device)
tex = cap.grab_gpu(timeout_ms=200)
if tex is not None:
    print(tex.width, tex.height, hex(tex.shared_handle), tex.luid)
    tex.close()    # release the duplicated NT handle

# context manager
with rustcam.Capturer(output=0) as cap:
    frame = cap.grab()

# module helpers
rustcam.list_outputs()               # list of dicts (one per output across all adapters)
rustcam.device_info()                # bettercam-style multi-line string
rustcam.output_info()                # same

Exceptions (all subclasses of rustcam.CaptureError):

  • CaptureError - base; catches every DXGI-origin failure
  • DeviceError - device removed / reset
  • DuplicationError - DuplicateOutput failed (often: another process already capturing this output)
  • AccessLost - exclusive fullscreen took over the display; rustcam retries duplication once internally
  • CaptureTimeout - raised by the streaming APIs (get_latest_frame, frames) when their deadline expires; grab() returns None on timeout instead

Each carries a .hresult attribute with the raw HRESULT when relevant.

Compatibility notes

A Capturer is bound to the OS thread that created it. Use one per thread. The Rust extension is #[pyclass(unsendable)], so passing a Capturer between threads raises a RuntimeError.

grab(), start(), frames(), and grab_gpu() are mutually exclusive while a long-running operation is active. While the Capturer is in background mode (start()) or iterating frames, calling grab() or grab_gpu() raises RuntimeError. Call stop() (or close the frames() iterator) first.

The first DDA frame after construction is sometimes black. rustcam discards two warmup frames internally so the first user-visible grab() returns real content.

DDA cant see HDCP-protected content (Netflix, Disney+, etc) - that's the DRM working as designed, and you get a black texture. UWP apps with the protected-content flag set behave the same way. There is no way around this without going through different APIs (WGC + ContentDeliveryManager) which are out of scope here.

grab_gpu() returns a shared NT handle. The consumer opens it on its own D3D11 device via OpenSharedResource1. v0.0.3 ships without a strict keyed-mutex protocol on the shared texture; the consumer should copy the texture into its own resource before issuing GPU work that depends on the content. A future release will add an opt-in strict mode.

Future work

  • Optional strict keyed-mutex mode for grab_gpu() so a single-writer single-reader pipeline can rely on producer/consumer ordering.
  • WGC (Windows.Graphics.Capture) backend as a fallback for per-window capture and HDR sources where DDA can't help.
  • 10-bit / HDR backbuffer support.
  • ARM64 Windows wheels.

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rustcam-0.0.5.tar.gz (221.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rustcam-0.0.5-cp39-abi3-win_amd64.whl (172.1 kB view details)

Uploaded CPython 3.9+Windows x86-64

File details

Details for the file rustcam-0.0.5.tar.gz.

File metadata

  • Download URL: rustcam-0.0.5.tar.gz
  • Upload date:
  • Size: 221.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for rustcam-0.0.5.tar.gz
Algorithm Hash digest
SHA256 dff2413baa53bb72eb877f769ee86a3d419ef06e9c244cd9a027bacc08f705ca
MD5 45a5715c0ea3d64e6d082bece8e5e6c2
BLAKE2b-256 01fe37bf3418de3552e4952b48fbc9ef8f013274d69cffa40eb4a2780a345dea

See more details on using hashes here.

File details

Details for the file rustcam-0.0.5-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: rustcam-0.0.5-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 172.1 kB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for rustcam-0.0.5-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 342dae9c3132fd15000d1815cb7b6944cf863e7020437aa125b758096201ee64
MD5 e38d5cacac279e982afff7c298cd0814
BLAKE2b-256 0b8b733b6f1b611dd2ea65ff9f777cde6a1d65c9f6ade99efc06601acafd2ac7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page