Skip to main content

SurgCLIP: Surgical vision-language model for phase recognition and workflow analysis

Project description

SurgCLIP

Surgical dual-encoder video-language model.

Installation

pip install surgclip

Quickstart

Video clip — from a frame path (loads neighbors automatically) -> RECOMMENDED

import torch
import surgclip
from surgclip import VideoPreprocessor

device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess, tokenizer = surgclip.load("SurgCLIP-B", device=device)
labels = [
    "Prepares for surgery by inserting trocars into the patient's abdominal cavity",
    "Employs grasper and hook during calot triangle dissection, manipulating gallbladder to reveal hepatic triangle, cystic duct and cystic artery",
    "Utilizes clipper to secure cystic duct and artery, followed by precise dissection using scissors",
    "Utilizes a hook to dissect the connective tissue during the dissection phase, separating gallbladder from the liver",
    "Secures the removed gallbladder in the specimen bag during the packaging phase of the procedure",
    "Employs suction and irrigation techniques to maintain a clear surgical field during the clean and coagulation phase, simultaneously coagulating bleeding vessels",
    "Handles the specimen bag during the retraction",
]
tokens = surgclip.tokenize(labels, tokenizer, device=device)

# Offline: window centered on the anchor frame
proc = VideoPreprocessor(num_frames=16 , sample_rate=1, mode="centered")
video = proc("./cholec80/frames/video01/video01_000843.png").to(device)

# Online: anchor frame is the last in the window
proc = VideoPreprocessor(num_frames=16 , sample_rate=1, mode="online")
video = proc("./cholec80/frames/video01/video01_000843.png").to(device)

with torch.no_grad():
    logits, _ = model(video, tokens)
    probs = logits.softmax(dim=-1).cpu().numpy()


print("Phase probs:", probs)
max_prob = logits.argmax(dim=-1).cpu().numpy()
pred = [labels[i] for i in max_prob]
print("Prediction:", pred)

Video clip — from a list of frames

from surgclip import VideoPreprocessor
from PIL import Image

proc = VideoPreprocessor(num_frames=16, sample_type="uniform")
frames = [
    Image.open("./cholec80/frames/video01/video01_000842.png"), 
    Image.open("./cholec80/frames/video01/video01_000843.png"), ...]

video = proc(frames).to(device)  # (1, 16, 3, 224, 224)

with torch.no_grad():
    logits, _ = model(video, tokens)
    probs = logits.softmax(dim=-1).cpu().numpy()

print("Phase probs:", probs)

Single image

Single image inference is supported, but we highly recommend using video input for better performance
from PIL import Image
img = preprocess(Image.open("./cholec80/frames/video01/video01_000843.png")).unsqueeze(0).unsqueeze(0).to(device)
tokens = surgclip.tokenize(labels, tokenizer, device=device)

with torch.no_grad():
    logits, _ = model(img, tokens)
    probs = logits.softmax(dim=-1).cpu().numpy()

print("Phase probs:", probs)

Feature extraction

import torch.nn.functional as F

with torch.no_grad():
    _, pooled_vision = model.encode_vision(video)   # (B, 768)
    _, pooled_text = model.encode_text(tokens)    # (B, 768)

    sim_v2t, sim_t2v = model.get_sim(
        model.vision_proj(pooled_vision),
        model.text_proj(pooled_text),
        temp=model.temp,
    )

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

surgclip-0.1.0.tar.gz (24.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

surgclip-0.1.0-py3-none-any.whl (26.0 kB view details)

Uploaded Python 3

File details

Details for the file surgclip-0.1.0.tar.gz.

File metadata

  • Download URL: surgclip-0.1.0.tar.gz
  • Upload date:
  • Size: 24.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for surgclip-0.1.0.tar.gz
Algorithm Hash digest
SHA256 63b3b7efaf2279717f5e129c2c5884b34344eda260cfc7530a3262be97c5762f
MD5 3589af8486f34148122a69387c43b46e
BLAKE2b-256 c2fe768bb3ac671dc4262640fe7a627590a701ea060375d9ede150e27d7459aa

See more details on using hashes here.

File details

Details for the file surgclip-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: surgclip-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 26.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for surgclip-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3ed5f72d141c4bda878610d3b6d7bd008618dcbda840faa6736c096db0f5504e
MD5 f23b16b97f3ca7381fad3203a892dfa0
BLAKE2b-256 3cdeb2bb16809de2e9cfc2c982b08f940d0902a6e628246f9fcceca51b4af962

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page