SurgCLIP: Surgical vision-language model for phase recognition and workflow analysis
Project description
SurgCLIP
Surgical dual-encoder video-language model.
Installation
pip install surgclip
Quickstart
Video clip — from a frame path (loads neighbors automatically) -> RECOMMENDED
import torch
import surgclip
from surgclip import VideoPreprocessor
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess, tokenizer = surgclip.load("SurgCLIP-B", device=device)
labels = [
"Prepares for surgery by inserting trocars into the patient's abdominal cavity",
"Employs grasper and hook during calot triangle dissection, manipulating gallbladder to reveal hepatic triangle, cystic duct and cystic artery",
"Utilizes clipper to secure cystic duct and artery, followed by precise dissection using scissors",
"Utilizes a hook to dissect the connective tissue during the dissection phase, separating gallbladder from the liver",
"Secures the removed gallbladder in the specimen bag during the packaging phase of the procedure",
"Employs suction and irrigation techniques to maintain a clear surgical field during the clean and coagulation phase, simultaneously coagulating bleeding vessels",
"Handles the specimen bag during the retraction",
]
tokens = surgclip.tokenize(labels, tokenizer, device=device)
# Offline: window centered on the anchor frame
proc = VideoPreprocessor(num_frames=16 , sample_rate=1, mode="centered")
video = proc("./cholec80/frames/video01/video01_000843.png").to(device)
# Online: anchor frame is the last in the window
proc = VideoPreprocessor(num_frames=16 , sample_rate=1, mode="online")
video = proc("./cholec80/frames/video01/video01_000843.png").to(device)
with torch.no_grad():
logits, _ = model(video, tokens)
probs = logits.softmax(dim=-1).cpu().numpy()
print("Phase probs:", probs)
max_prob = logits.argmax(dim=-1).cpu().numpy()
pred = [labels[i] for i in max_prob]
print("Prediction:", pred)
Video clip — from a list of frames
from surgclip import VideoPreprocessor
from PIL import Image
proc = VideoPreprocessor(num_frames=16, sample_type="uniform")
frames = [
Image.open("./cholec80/frames/video01/video01_000842.png"),
Image.open("./cholec80/frames/video01/video01_000843.png"), ...]
video = proc(frames).to(device) # (1, 16, 3, 224, 224)
with torch.no_grad():
logits, _ = model(video, tokens)
probs = logits.softmax(dim=-1).cpu().numpy()
print("Phase probs:", probs)
Single image
Single image inference is supported, but we highly recommend using video input for better performance
from PIL import Image
img = preprocess(Image.open("./cholec80/frames/video01/video01_000843.png")).unsqueeze(0).unsqueeze(0).to(device)
tokens = surgclip.tokenize(labels, tokenizer, device=device)
with torch.no_grad():
logits, _ = model(img, tokens)
probs = logits.softmax(dim=-1).cpu().numpy()
print("Phase probs:", probs)
Feature extraction
import torch.nn.functional as F
with torch.no_grad():
_, pooled_vision = model.encode_vision(video) # (B, 768)
_, pooled_text = model.encode_text(tokens) # (B, 768)
sim_v2t, sim_t2v = model.get_sim(
model.vision_proj(pooled_vision),
model.text_proj(pooled_text),
temp=model.temp,
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
surgclip-0.1.0.tar.gz
(24.2 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
surgclip-0.1.0-py3-none-any.whl
(26.0 kB
view details)
File details
Details for the file surgclip-0.1.0.tar.gz.
File metadata
- Download URL: surgclip-0.1.0.tar.gz
- Upload date:
- Size: 24.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
63b3b7efaf2279717f5e129c2c5884b34344eda260cfc7530a3262be97c5762f
|
|
| MD5 |
3589af8486f34148122a69387c43b46e
|
|
| BLAKE2b-256 |
c2fe768bb3ac671dc4262640fe7a627590a701ea060375d9ede150e27d7459aa
|
File details
Details for the file surgclip-0.1.0-py3-none-any.whl.
File metadata
- Download URL: surgclip-0.1.0-py3-none-any.whl
- Upload date:
- Size: 26.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ed5f72d141c4bda878610d3b6d7bd008618dcbda840faa6736c096db0f5504e
|
|
| MD5 |
f23b16b97f3ca7381fad3203a892dfa0
|
|
| BLAKE2b-256 |
3cdeb2bb16809de2e9cfc2c982b08f940d0902a6e628246f9fcceca51b4af962
|