Telemed ultrasound interop: extract .tvd recordings to HDF5 + per-panel mp4.
Project description
telemed
Direct-read pipeline for Telemed ultrasound .tvd recordings — HDF5
metadata + per-panel mp4 with inner-image autocrop.
Extracts .tvd recordings to HDF5 sidecars via the AutoInt1 COM API,
then encodes to mp4 for downstream DLC / DUSTrack consumption. This
document is a reference for the design decisions baked into the
package; the per-function docstrings cover the surface in detail.
Install
pip install telemed
The Windows-only pywin32 dependency (required by the .tvd → .tvd.h5
extract path, which talks to EchoWave via COM) installs automatically
on Windows and is skipped on macOS / Linux. Non-Windows installs still
get the encode + analyze paths (export_video, process() over Set B
.tvd.h5 inputs, Log), they just can't run COM extraction.
ffmpeg must be on PATH (used for the h265 encode).
Quickstart
import telemed
# Start EchoWave II as Administrator first; start your Python
# session as Administrator too. (COM ROT is per-elevation.)
telemed.process(r"M:/data/pia02")
# Equivalent to:
# telemed.export_h5(r"M:/data/pia02") # tvd -> tvd.h5 (Admin + EchoWave)
# telemed.export_video(r"M:/data/pia02") # tvd.h5 -> mp4(s) (offline)
# Inspect a single recording:
lf = telemed.Log("M:/data/pia02/scan.tvd.h5")
lf.view() # matplotlib browser, with scale bar
lf.to_video() # single-recording mp4 encode
Pipeline
.tvd --[COM via AutoInt1]--> .tvd.h5 --[ffmpeg]--> .mp4(s)
export_h5 export_video
Admin EchoWave required offline (no EchoWave)
process() chains both stages on the same source. All three accept a
file, folder, or iterable; recursive=True by default; idempotent
under skip_existing=True (default).
HDF5 schema (v1)
Composite suffix <stem>.tvd.h5 so downstream glob walks (*.tvd.h5)
catch them without picking up unrelated HDF5 data.
Root attributes (flat -- no nested groups):
n_frames,full_frame_width,full_frame_heightn_b_images-- count of active B-mode panels (1 = single probe; 2 = dual probe; up to 4)source_tvd_path,extracted_at_iso,schema_version="v1"image_dx_cm_per_px,image_dy_cm_per_px-- display scale derived fromb_depth_mm / 10 / panel_height_px(Telemed support's "trust the depth setting" calibration). Skipped ifb_depthwasn't captured.- Per active img_id N ∈ {1, 2, 3, 4} (1=B, 2=B2, 3=B3, 4=B4):
roi{N}_x1,roi{N}_x2,roi{N}_y1,roi{N}_y2(1-based pixel coords matching AutoInt1's convention)roi{N}_width,roi{N}_height(inclusive pixel counts)physical_dx{N}_cm_per_px,physical_dy{N}_cm_per_px(beamformer-native scale -- see "scale" note below)
param_*-- opportunistic ParamGet sweep (~36 fields per recording on real EchoWave acquisitions): probe / beamformer identity, cine end timestamp, B-mode acquisition (depth, frequency, gain, dynamic range, focus, THI, frame averaging, ...), geometry / orientation (scan-direction-changed, rotate, view-area, scan-type, ...), sanity probes (file-opened, scanning-state, probe-active).
Inner-image autocrop bounds are NOT in the schema. The encoder detects the inner ultrasound image (depth ruler / margins / tick row stripped) from frame pixels at encode time -- see "Inner-image autocrop" below. Keeping the bounds out of the sidecar means a detector tweak only requires a re-encode (offline), not a re-extract (Admin EchoWave).
Datasets:
/timing/frame_idx_1n-- int32 (N,)/timing/time_ms-- float64 (N,), frame 0 anchored at 0.0 ms/timing/ifi_ms-- float64 (N,), inter-frame intervals/frames/gray-- uint8 (N, H, W), full-frame display capture (omit by passingframes=Falsefor a fast timing-only extract)
Schema history
| version | date | change |
|---|---|---|
| v1 | 2026-05-24 | initial public release. Consolidates the |
in-development v1a{1..5} series (single ROI / params sweep / |
||
| multi-ROI / display-scale capture) into one labelled baseline. |
Log reads both the public v1 label and the legacy in-development
labels (v1a{1..5}) transparently, so on-disk sidecars produced by
pre-graduation pipelines keep loading. Production extracts always
write v1. The inner-image autocrop is computed at encode time and
doesn't bump the schema -- existing on-disk sidecars get autocropped
mp4s "for free" the next time export_video runs over them.
Inner-image autocrop
The panel ROI from AutoInt1's GetUltrasoundX{1,2}/Y{1,2} is the
full B-mode panel: depth ruler + side margins + inner ultrasound
image + bottom-tick row. The depth ruler alone eats ~25% of the
panel width on single-probe acquisitions, and the bottom-tick row
pollutes the bottom with sharp non-anatomical contrast that confuses
DLC. We don't want either in the per-panel mp4.
export_video runs a content-based detector against an aggregate of
16 evenly-spaced frames per panel and crops each mp4 to the inner
ultrasound image. Detection failures fall back to the outer panel
ROI with a warning. crop="panel" opts out explicitly.
Why at encode time, not extract time? Re-extraction requires
Admin EchoWave + Admin Python. The detector is offline (just needs
/frames/gray), and encoding is the slow stage anyway -- ~50 ms of
detection vs. minutes of h265 lossless. Putting detection here
means detector tweaks only need a re-encode, not a re-extract.
Algorithm
- Cols. Estimate the EchoWave UI gray level from the panel's leftmost+rightmost five cols (taking the median to ignore the depth-ruler "0" digit's bright pixels). A column is "margin" when its mean is within ~12 of that gray AND its vertical std is low (~12); the longest contiguous run of non-margin cols is the inner image width. Tick rows are pre-trimmed for this pass so a saturated bright stripe doesn't poison the per-col std.
- Rows. Walk up from the last panel row, peeling off rows whose
col-restricted mean exceeds the tick threshold
(
max(60, 2*median + 20)). Top edge of the inner image is the panel top (probed Telemed configurations place the depth-ruler "0" digit above the panel ROI'sy1).
Why content-based detection (not a probe-aperture lookup)
The probe-table approach (predict inner width from
probe_name + aperture_mm + image_d{x,y}_cm_per_px) was
sketched + dismissed 2026-05-24: predicted width was ~4-5% off
observed (670 vs 700 px on the LF9-5N60-A3 probe), meaning a
per-probe empirical correction table would be needed anyway. The
content-based detector self-calibrates from the actual pixels and
generalises to unfamiliar probes without a maintenance table.
For a fixed (probe, depth, view_area, panel_dims), the detected inner ROI is deterministic across recordings of the same acquisition config -- a useful invariant for cross-file consistency audits.
Failure mode: fully-black / degenerate recordings
_detect_image_roi returns None when the gray-margin step isn't
clear (no detectable inner-image bounds, or the resulting box is
< 20% of the panel area on either axis). The encoder falls back to
the panel ROI with a warning so the regression is visible.
Multi-probe auto-detection
A dual-probe acquisition lights up two B-mode panels side-by-side
(B + B2 in AutoInt1's enum, img_id=1 + img_id=2). The authoritative
detection signal is the ROI enumeration itself: _collect_b_mode_rois
probes img_ids 1-4 and keeps every panel that returns a positive-area
rectangle. The number of populated ROIs = number of physical probes
in use.
Why not scanning_state (id 200)? It's a useful sanity-check but
has undocumented sub-states. The 2026-05-24 pia02 probe reported
state=25, which isn't in any documented id_state_bb_* constant.
The ROI count never ambiguates.
export_video follows: single-probe -> <stem>.mp4; dual-probe ->
<stem>_b1.mp4 + <stem>_b2.mp4.
Inactive-panel sentinel: (0, 0, 0, 0)
AutoInt1 returns the zero-rect sentinel (x1, x2, y1, y2) = (0, 0, 0, 0)
for inactive img_ids rather than raising. The 2026-05-24 metadata
probes on usl02 (single-probe) revealed B2/B3/B4 all coming back as
zero-rect. The TelemedRoi.from_cmd validator rejects anything that
isn't a strict positive-area rectangle (x2 > x1 AND y2 > y1),
catching both the sentinel and any inverted/negative rect. Without
that fix, single-probe recordings would have been mis-classified as
quad-probe and produced four _b{N}.mp4 files (three of them
degenerate).
Encode pipeline: lossless h265 mono, ultrafast preset
Why lossless
The source /frames/gray is uint8 grayscale straight off the
beamformer -- no upstream lossy step to reclaim quality from. A
CRF-tuned encode is buying file size at the cost of DLC accuracy that
the device gave us for free. The cropped per-panel ROI is small
enough (~700x550) that lossless h265 lands at ~3 GB per 20k-frame
recording, tolerable at corpus scale.
export_video(..., lossless=False, crf=N) is available as an opt-in
for users who'd rather trade a few percent of accuracy for ~50x
smaller files.
Why preset="ultrafast"
For lossless h265 the preset only trades file size against encode
- decode speed -- reconstructed pixels are bit-exact regardless.
ultrafastwins on every time axis (encode + linear decode + random seek + TOC build) at +6% size overfast, ~+15% overslow. Full bench table + methodology + raw numbers inBENCHMARKING.md. DLC inference is GPU-bound (~175 fps ceiling), so encoder preset is effectively neutral for inference; the decode + seek wins are what matter for interactive use (DUSTrack labeling, frame scrubbing).
Accuracy invariance: all four lossless presets produce DLC
predictions identical within float32 noise (max |delta| < 1e-4 px).
The lossy legacy h264/yuv420p path costs ~0.58 px median DLC keypoint
error and outliers up to 41 px -- the lossless h265 mono pipeline
closes that gap. Parity bench data in
BENCHMARKING.md.
Dustrack UI fps: lossless h265 mono is ~22% faster in the
dustrack UI than legacy h264 yuv420p (42 fps vs 34 fps on the
encoding-axis adapter probe). Within the lossless presets the choice
is noise (~5% spread). Stacks with the existing fast_render Tier 2
architectural 3.94x gain from dustrack/BENCHMARKING.md.
Power-user overrides:
preset="slow"for the smallest lossless files (at ~7x slower encode + ~2x slower decode; accuracy-equivalent).lossless=False, crf=22for the smallest files at the ~0.6 px accuracy penalty (~50x smaller than lossless).
CPU vs GPU
We stick with CPU encoding (libx265). GPU options were investigated 2026-05-24 and rejected:
- NVENC h264 has
-preset lossless/losslesshp, but h264 lossless of monochrome content runs 2-3x larger than h265 lossless, and NVENC typically convertsgraytoyuv420pwith synthetic chroma planes (defeats the chroma-free pipeline). - NVENC h265 lossless support is inconsistent across drivers and often not bit-exact.
- Other GPU paths (AMD VCN, Intel QSV) have similar limitations.
h264_nvenc -preset slow -rc constqp -qp 0 is fast and visually
indistinguishable from lossless for spot-checks, but it's not
bit-exact and uses yuv420p -- not a production output.
TOC creation (datanavigator + PyAV)
datanavigator.VideoReader (the PyAV+TOC backend used by DUSTrack /
datanavigator) builds a packet-level index the first time it opens an
mp4 and caches it as <mp4>.dnav-toc next to the video. TOC build
time follows the same preset curve as decode speed -- ultrafast ~2x
faster than slow. See BENCHMARKING.md for raw
numbers.
Orientation normalisation
EchoWave operators can toggle scan-direction (L/R flip) and rotation per machine. If different machines in a cohort save with different orientations, the same anatomy appears mirrored/rotated across recordings -- catastrophic for cohort-wide DLC training.
The schema captures the L/R-flip state via b_is_scan_direction_changed
(AutoInt1 id 133, bool) and rotation via b_rotate (id 132, int).
export_video applies -vf hflip when the flip flag is True, so
every cohort mp4 lands in a canonical orientation regardless of
which operator toggled what on which machine. normalize_orientation=False
disables this for power-user inspection.
Known limitation: U/D flip
AutoInt1's id_b_flip_up_down (105) is action-only -- there's no
companion getter. So U/D flip cannot be detected from the sidecar.
Mitigation: lock down the acquisition SOP ("never toggle U/D flip") and visually spot-check representative frames per machine during cohort onboarding. If a U/D mismatch is found, the operator needs to either re-acquire or manually flag the affected recordings.
Known limitation: rotation enum
b_rotate returns an int (0/1/2/3? or actual degrees?) -- the
AutoInt1 docs don't specify the mapping. Both probed cohorts
(usl02, pia02) report 0. When a non-zero b_rotate is encountered,
_orientation_vf warns + leaves the pixels untouched; the user
should investigate and update the function once the mapping is
verified against a deliberately-rotated recording.
Spatial scale: physical (beamformer) vs image (display)
There are two scales to be aware of, and they differ by ~2% on typical Telemed acquisitions:
| Attribute | Returns | Use for... |
|---|---|---|
Log.physical_dx_cm_per_px |
beamformer-native sample spacing (from GetUltrasoundPhysicalDeltaX) |
hardware provenance; not measurement |
Log.image_dx_cm_per_px |
display scale -- b_depth_mm / 10 / panel_height_px |
cm conversions on tracked-point coords |
The reported physical_dx is the beamformer's native radial sample
spacing -- a function of the ADC clock and the assumed speed of sound.
But EchoWave renders the resulting image onto a display frame whose
height is laid out to match the operator-selected depth setting (the
depth ruler IS the trusted calibration). For a 50 mm depth setting
on a 558 px panel:
physical_dy_cm_per_px= 0.009166 cm/px (=> 558 × dy = 5.11 cm)image_dy_cm_per_px= 0.05 cm / 558 px = 0.00896 cm/px (=> 5.00 cm)
Per Telemed support: trust the depth setting. So image_d{x,y}
is what you want for spatial measurements on DLC keypoints. Both
attributes return None for v1 sidecars that lack params["b_depth"].
The display x scale is assumed equal to the y scale (image_dx == image_dy)
because Telemed renders with square display pixels (1:1 aspect so
anatomy doesn't squish) and AutoInt1 reports physical_dx == physical_dy
on every probed acquisition. If a future probe breaks this
assumption it'll surface as anatomy rendered with a non-1:1 aspect in
Log.view(); revisit then.
Timing
/timing/time_ms carries true per-frame timestamps at the device's
native ~100 ns precision (Telemed acquisitions are VFR -- the inter-
frame interval varies frame to frame around the mean fps; 50% of
frames land more than 1 ms off the mean). DLC and cv2 index by
frame number, so the encoded mp4 declares CFR at mean_fps -- but
the .tvd.h5 is the source of truth for real time. Downstream
analysis converting tracked points back into the OT clock should
round-trip via Log.time_ms[frame_idx], not the mp4's frame rate.
There is no timing CSV sidecar; an earlier design considered one, but it duplicates data already in the .tvd.h5 (and would risk desyncing).
Last-frame outlier
Both DICOM FrameTimeVector and the COM-extracted IFI show a ~0.078
ms inter-frame interval as the final entry on every recording --
recording-termination artifact (compound sub-frame). Harmless if
downstream sync work either drops the last frame or weights by IFI.
Per-session setup (Administrator EchoWave + Administrator Python)
The h5 stage (export_h5 / process) requires:
-
One-time per machine: register the COM ProgID. From an Administrator PowerShell:
cd "C:\Program Files\Telemed\Echo Wave II Application\EchoWave II\Config\Plugins" .\AutoInt1_regasm.batYou should see "Types registered successfully". Without this,
GetActiveObject('EchoWave2.CmdInt1')raises "Invalid class string". -
Per session: start EchoWave II as Administrator (right-click -> "Run as administrator").
-
Per session: start your Python (or terminal) as Administrator. COM ROT publication is per-elevation; a non-elevated client can't see an elevated server.
The video stage (export_video) is offline and runs in any Python.
Network drives
EchoWave's OpenFile fails on UNC / mapped network paths in our
setup. export_h5 auto-detects non-C: drives + UNC prefixes, copies
each .tvd to a local temp dir, processes there, and copies the
resulting .h5 back to the source folder. A ThreadPoolExecutor
overlaps the network copies with the COM extraction
(stage N+1 + unstage N-1 run on workers while the COM thread handles
N).
Implementation notes
Pipe deadlock (libx265 + Popen)
libx265 prints per-frame statistics to stderr by default. Reading
frames from stdin via subprocess.Popen(..., stderr=PIPE) deadlocks
once the stderr buffer (~64 KB) fills -- the encoder blocks waiting
for the reader, the Python frame loop blocks waiting for the encoder.
The cmd builder mandates -hide_banner -loglevel error so stderr
stays small enough to never fill.
pywin32 + .NET CCW: zero-arg methods are properties
AutoInt1's zero-argument calls (GetFramesCount, GetCurrentFrameTime,
etc.) expose as properties under pywin32's late binding, not
callables. Drop the parens:
n = cmd.GetFramesCount # works
n = cmd.GetFramesCount() # TypeError
Methods with arguments use normal parens. This bit the initial probe script; it's wrapped inside the module now.
Related files
BENCHMARKING.md-- preset bench tables (encode / decode / seek), DLC accuracy parity data (lossless h265 mono vs legacy h264 yuv420p), and TOC build numbers._metadata_probe.py-- parsesAutoInt1Client.txtto classify all ~378 documented ids by extraction strategy; the source-of-truth for the_PARAM_SPECScuration.
Acknowledgments
This package was developed as part of the ImmersionToolbox initiative at the MIT.nano Immersion Lab. Thanks to NCSOFT for supporting this initiative.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file telemed-0.1.0.tar.gz.
File metadata
- Download URL: telemed-0.1.0.tar.gz
- Upload date:
- Size: 109.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.31.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ef53680bbd32b6d228559964754744696d2def9c629b2b4125712ed987d1049
|
|
| MD5 |
9434612e8b79006c0e604a1b52f5ad74
|
|
| BLAKE2b-256 |
304bc2ad29ccb2d069abc5984823e4fe4c4cfe2a2c9703331d9dca23685f71c7
|
File details
Details for the file telemed-0.1.0-py3-none-any.whl.
File metadata
- Download URL: telemed-0.1.0-py3-none-any.whl
- Upload date:
- Size: 65.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.31.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f3e137a49876067860f7045c5bac7b3f79740c04a866be4332902a8192ca3c5
|
|
| MD5 |
e1d94e0ace589e9d41649f9c216ff76c
|
|
| BLAKE2b-256 |
24cccf4a14d304e0b119399efe74750d2c693b72c0ea41432a87099b2388674e
|