Client utilities for Neuromeka VFM FoundationPose RPC (upload meshes, call server)
Project description
neuromeka_vfm
A lightweight client SDK for communicating with Segmentation servers (SAM2 pipeline and SAM3) and Pose Estimation (NVIDIA FoundationPose) servers over RPC/ZeroMQ. It also provides SSH/SFTP utilities to upload mesh files to the host.
- Website: http://www.neuromeka.com
- PyPI package: https://pypi.org/project/neuromeka_vfm/
- Documents: https://docs.neuromeka.com
Installation
pip install neuromeka_vfm
Optional feature extras:
# Segmentation compression helpers (h264/png/jpeg): av + OpenCV
pip install "neuromeka_vfm[segmentation]"
# Point-cloud utilities: trimesh + tqdm
pip install "neuromeka_vfm[pcd]"
# SSH/SFTP mesh upload: paramiko
pip install "neuromeka_vfm[ssh]"
# DINOv3 image-prompt detection helpers
pip install "neuromeka_vfm[dinov3]"
# Local demo dependencies: Pillow + OpenCV + pyrealsense2
pip install "neuromeka_vfm[examples]"
# All optional dependencies
pip install "neuromeka_vfm[all]"
Python API (usage by example)
- Client PC: the machine running your application with this package installed.
- Host PC: the machine running Segmentation and Pose Estimation Docker servers. If you run Docker locally, use
localhost.
Segmentation
Install extra first: pip install "neuromeka_vfm[segmentation]".
from neuromeka_vfm import Segmentation
seg = Segmentation(
hostname="192.168.10.63",
port=5432,
compression_strategy="png", # none | png | jpeg | h264
)
# Register using an image prompt
seg.add_image_prompt("drug_box", ref_rgb)
seg.register_first_frame(
frame=first_rgb,
prompt="drug_box", # ID string
use_image_prompt=True,
)
# Register using a text prompt
seg.register_first_frame(
frame=first_rgb,
prompt="box .", # Text prompt (must end with " .")
use_image_prompt=False,
)
# SAM2 tracking on the registered mask(s)
resp = seg.get_next(next_rgb)
if isinstance(resp, dict) and resp.get("result") == "ERROR":
print(f"Tracking error: {resp.get('message')}")
seg.reset()
else:
masks = resp
# Segmentation settings / model selection (nrmk_realtime_segmentation v0.2+)
caps = seg.get_capabilities()["data"]
current = seg.get_config()["data"]
seg.set_config(
{
"grounding_dino": {
"backbone": "Swin-B", # Swin-T | Swin-B
"box_threshold": 0.35,
"text_threshold": 0.25,
},
"dino_detection": {
"threshold": 0.5,
"target_multiplier": 25,
"img_multiplier": 50,
"background_threshold": -1.0,
"final_erosion_count": 10,
"segment_min_size": 20,
},
"sam2": {
"model": "facebook/sam2.1-hiera-large",
"use_legacy": False,
"compile": False,
"offload_state_to_cpu": False,
"offload_video_to_cpu": False,
},
}
)
# Remove an object (v0.2+, only when use_legacy=False)
seg.remove_object("cup_0")
seg.close()
Additional Segmentation APIs and behaviors
benchmark=Truein the constructor enables timing counters (call_time,call_count) foradd_image_prompt,register_first_frame, andget_next.switch_compression_strategy()lets you change the compression strategy at runtime.register_first_frame()returnsTrue/Falseand raisesValueErrorif image prompts are missing whenuse_image_prompt=True.register_first_frame()accepts a list of prompt IDs whenuse_image_prompt=True.get_next()returnsNoneif called before registration; it can also return the server error dict when available.reset()performs a server-side reset, whilefinish()clears only local state.- Exposed state:
tracking_object_ids,current_frame_masks,invisible_object_ids. - Backward-compat alias:
NrmkRealtimeSegmentation.
Segmentation v0.2 config summary (defaults/choices)
seg.get_capabilities() can differ depending on server configuration. The following reflects v0.2 defaults.
grounding_dino:
backbone:
choices:
- Swin-B
- Swin-T
default: Swin-T
box_threshold:
default: 0.35
min: 0.0
max: 1.0
text_threshold:
default: 0.25
min: 0.0
max: 1.0
dino_detection:
threshold:
default: 0.5
target_multiplier:
default: 25
img_multiplier:
default: 50
background_threshold:
default: -1.0
final_erosion_count:
default: 10
segment_min_size:
default: 20
sam2:
model:
choices:
- facebook/sam2-hiera-base-plus
- facebook/sam2-hiera-large
- facebook/sam2-hiera-small
- facebook/sam2-hiera-tiny
- facebook/sam2.1-hiera-base-plus
- facebook/sam2.1-hiera-large
- facebook/sam2.1-hiera-small
- facebook/sam2.1-hiera-tiny
default: facebook/sam2.1-hiera-large
use_legacy:
default: false
compile:
default: false
offload_state_to_cpu:
default: false
offload_video_to_cpu:
default: false
Segmentation v0.2 notes and changes
- If SAM2 VRAM estimation fails,
seg.get_next()may return{"result":"ERROR"}. Handle the error and callresetbefore re-registering. compile=Truecan slow down first-frame registration andreset.- CPU offloading is most effective when both
offload_state_to_cpu=Trueandoffload_video_to_cpu=Trueare set (legacy mode does not supportoffload_video_to_cpu). remove_objectis supported only whenuse_legacy=False.- GroundingDINO added the Swin-B backbone and fixed prompt-token merge issues.
SAM3 Segmentation
Sam3Segmentation is a separate client for the SAM3 docker server.
It supports both single-frame prediction API and streaming tracking API.
from neuromeka_vfm import Sam3Segmentation
sam3 = Sam3Segmentation(hostname="192.168.4.109",
port=5559,)
sam3.check()
caps = sam3.get_capabilities()["data"]
config = sam3.get_config()["data"]
sam3.set_config(
{
"resolution": 1008,
"confidence_threshold": 0.5,
"compile": False,
}
)
# text prompt
resp = sam3.predict_text(frame=rgb, prompt="bolt")
# box prompt
resp = sam3.predict_box(
frame=rgb,
boxes=[[700, 470, 980, 620]],
box_format="xyxy_abs",
)
# text + box prompt
resp = sam3.predict(
frame=rgb,
prompt="bolt",
boxes=[[700, 470, 980, 620]],
box_format="xyxy_abs",
)
if resp.get("result") == "SUCCESS":
print(sam3.last_obj_ids)
print(sam3.last_scores)
print(sam3.last_boxes_xyxy)
masks = sam3.current_frame_masks # {obj_id: mask(H,W,1)}
# tracking flow
reg = sam3.register_first_frame(
frame=rgb0,
boxes=[[700, 470, 980, 620]],
phrases=["bolt"],
)
if reg.get("result") == "SUCCESS":
nxt = sam3.get_next(frame=rgb1) # alias of track()
if nxt.get("result") == "SUCCESS":
print(sam3.last_obj_ids)
print(sam3.current_frame_masks.keys())
sam3.remove_object(obj_id=sam3.last_obj_ids[0], strict=False, need_output=True)
sam3.stop_tracking(free_vram=True, drop_tracking_predictor=False)
memory_report = sam3.get_memory_report()
sam3.reset(free_vram=True, reset_tracking=True, drop_tracking_predictor=False)
sam3.close()
Sam3Segmentation methods:
check()get_capabilities()get_config()set_config(config)get_memory_report()reset(free_vram=True, reset_tracking=True, drop_tracking_predictor=False)reset_tracking(free_vram=True, drop_tracking_predictor=False)stop_tracking(free_vram=True, drop_tracking_predictor=False)predict(frame, prompt=None, boxes=None, labels=None, box_format="cxcywh_norm", confidence_threshold=None)predict_text(frame, prompt, confidence_threshold=None)predict_box(frame, boxes, labels=None, box_format="cxcywh_norm", prompt=None, confidence_threshold=None)register_first_frame(frame, boxes=None, phrases=None, points_data=None)track(frame)get_next(frame)remove_object(obj_id, strict=False, need_output=False)close()
Sam3Segmentation state:
last_obj_idscurrent_frame_masks({obj_id: mask(H,W,1)})last_boxes_xyxylast_scorestracking_object_idsinvisible_object_idsfirst_frame_registeredlast_frame_idxtracking_active(read-only property)tracked_obj_ids(read-only property)
SAM3 tracking request/response summary:
register_first_framerequest (box/phrase):{"operation":"register_first_frame","frame":frame,"boxes":...,"phrases":...}
register_first_framerequest (points):{"operation":"register_first_frame","frame":frame,"points_data":{"obj_id":{"input_point":...,"input_label":...}}}
register_first_framesuccess response:{"result":"SUCCESS","data":{"frame_idx":int,"obj_ids":[...],"masks":uint8(N,H,W,1)}}
track/get_nextrequest:{"operation":"track","frame":frame}or{"operation":"get_next","frame":frame}
track/get_nextsuccess response:{"result":"SUCCESS","data":{"obj_ids":[...],"masks":uint8(N,H,W,1)}}
remove_objectrequest/response:- request:
{"operation":"remove_object","obj_id":"...","strict":bool,"need_output":bool} - response:
{"result":"SUCCESS","data":{"obj_ids":[...]}}
- request:
resetsupports tracking reset:{"operation":"reset","free_vram":bool,"reset_tracking":bool,"drop_tracking_predictor":bool}
get_memory_reportresponse:- GPU/모델 메모리 트리, runtime tensor usage, allocator stats
Reset behavior guidance:
stop_tracking()is the recommended API to stop SAM3 streaming tracking.reset_tracking(...)is still available as an explicit alias.- Current merged server keeps
drop_tracking_predictorparameter for compatibility, but the server currently ignores the drop request and keeps the predictor resident.
Capability-based local validation:
- On initialization (
validate_capabilities_on_init=Trueby default), the client readsget_capabilities()and caches supportedbox_formats/config_keys. set_configand box-prompt calls validate values against that cache when available.
DINOv3 Image Prompt Detection
Dinov3Client is the low-level client for the nrmk_dinov3 ZeroMQ server.
Dinov3Detection adds image-prompt payload construction and heatmap-to-mask postprocessing.
It can optionally refine DINOv3 heatmap points through the SAM3 server by setting run_sam3=True.
The package does not start Docker or load DINOv3 weights; run the DINOv3 server separately.
from neuromeka_vfm import Dinov3Client, Dinov3Detection, Dinov3DetectionConfig
# Low-level RPC client
dino = Dinov3Client(hostname="127.0.0.1", port=5568, timeout_ms=180000)
print(dino.get_capabilities())
print(dino.get_config())
dino.close()
# High-level image-prompt detection
detector = Dinov3Detection(hostname="127.0.0.1", port=5568, timeout_ms=180000)
resp = detector.detect_image_prompt(
scene=scene_rgb, # np.uint8 RGB, shape (H, W, 3)
prompt_image=prompt_rgb, # RGB or RGBA image crop, shape (h, w, 3|4)
points=None, # optional; defaults to alpha/object center
config=Dinov3DetectionConfig(
backbone="vith16plus",
scene_patch_multiplier=50,
prompt_patch_multiplier=25,
threshold=0.7,
max_detections=1,
),
)
if resp.get("result") == "SUCCESS":
data = resp["data"]
masks = data["masks"] # {"1": mask(H,W,1)}
bboxes = data["bboxes"] # [{"label":1,"top":...,"left":...,"bottom":...,"right":...}]
heatmap = data["scene_heatmap"]
detector.close()
DINOv3 + SAM3 refinement
This SDK-side workflow calls the DINOv3 server first, selects high-score native DINO patch centers from the heatmap, sends those points to the SAM3 server as point prompts, and returns the consensus SAM3 mask.
from neuromeka_vfm import Dinov3Detection, Dinov3DetectionConfig
detector = Dinov3Detection(hostname="127.0.0.1", port=5568, timeout_ms=180000)
resp = detector.detect_image_prompt(
scene=scene_rgb,
prompt_image=prompt_rgb,
points=[{"x": 36, "y": 16}], # reference-image point on the target object
config=Dinov3DetectionConfig(
backbone="vith16plus",
threshold=0.2,
run_sam3=True,
sam3_hostname="127.0.0.1", # defaults to the DINOv3 hostname
sam3_port=5559,
sam3_top_n_points=4,
),
)
if resp.get("result") == "SUCCESS":
data = resp["data"]
masks = data["masks"] # SAM3-refined masks
dino_mask = data["dino_labeled_mask"] # thresholded DINO component mask
selected_points = data["sam3"]["top_patch_points"]
detector.close()
For a separate workflow object, import Dinov3Sam3Detection and Dinov3Sam3DetectionConfig.
Dinov3DetectionConfig fields:
backbone:vith16plusby default. Usevit7b16only on machines with enough VRAM.model_dtype: optional server dtype override (bfloat16,float16,float32).scene_patch_multiplier: long-side resize multiplier for the scene image.prompt_patch_multiplier: long-side resize multiplier for prompt images.threshold: score threshold used to convert the scene heatmap to masks.final_erosion_count: optional 3x3 erosion iterations after thresholding.segment_min_size: minimum component area in pixels.max_detections: optional top-N component limit.run_sam3: when true, use DINOv3 heatmap-selected point prompts to refine masks with SAM3.point_selection_mode:global_top_nortop_n_per_component.sam3_top_n_points: number of positive DINO patch centers to pass to SAM3.sam3_negative_top_n_points: optional negative point count from medium/low score regions.sam3_mask_consensus_mode:areaoriouconsensus across SAM3 masks.sam3_hostname,sam3_port: SAM3 server location for refinement.
Pose Estimation
Optional: Generate simple box STL (client-side utility)
from neuromeka_vfm import MeshGenerator, write_box_stl
# function style
path = write_box_stl(
filename="box_61x56x99.stl",
width=0.0617, # X (m)
depth=0.0564, # Y (m)
height=0.0993, # Z (m)
output_dir="./mesh", # optional, not fixed to /opt/meshes
)
# class style
mesh_gen = MeshGenerator(output_dir="./mesh")
path2 = mesh_gen.write_box_stl("box2.stl", width=0.05, depth=0.05, height=0.05)
Path rule:
- absolute
filename: write exactly there - relative
filename: resolve byoutput_dir, else$NRMK_MESH_DIR, else/opt/meshes
Mesh upload: Upload the mesh file (STL) to /opt/meshes/ on the host PC. You can also use SSH directly.
Install extra first: pip install "neuromeka_vfm[ssh]".
from neuromeka_vfm import upload_mesh
upload_mesh(
host="192.168.10.63",
user="user",
password="pass",
local="mesh/my_mesh.stl", # local mesh path
remote="/opt/meshes/my_mesh.stl", # host mesh path (Docker volume)
)
Initialization
from neuromeka_vfm import PoseEstimation
pose = PoseEstimation(host="192.168.10.72", port=5557)
pose.init(
mesh_path="/app/modules/foundation_pose/mesh/my_mesh.stl",
apply_scale=1.0,
track_refine_iter=3,
min_n_views=40,
inplane_step=60,
)
# Or initialize directly with mesh arrays (without mesh_path)
pose.init(
mesh_vertices=mesh_vertices, # (V, 3)
mesh_faces=mesh_faces, # (F, 3)
symmetry_tfs=symmetry_tfs, # optional: (N,4,4) or (4,4)
)
- mesh input for
init:mesh_path(STL/OBJ path), or- direct mesh payload
mesh_vertices+mesh_faces - if
mesh_pathis given, it is used with priority
- symmetry_tfs: optional symmetry transforms with shape
(N,4,4)or(4,4). - apply_scale: scalar applied after loading the mesh.
- STL in meters: 1.0 (no scaling)
- STL in centimeters: 0.01 (1 cm -> 0.01 m)
- STL in millimeters: 0.001 (1 mm -> 0.001 m)
- force_apply_color: if True, forces a solid color when the mesh lacks color data.
- apply_color: RGB tuple (0-255) used when
force_apply_color=True. - est_refine_iter: number of refinement iterations during registration (higher = more accurate, slower).
- track_refine_iter: number of refinement iterations per frame during tracking.
- min_n_views: minimum number of sampled camera views (affects rotation candidates).
- inplane_step: in-plane rotation step in degrees (smaller = more candidates).
Registration and tracking
# Registration (server defaults when iteration is omitted, check_vram=True pre-checks VRAM)
register_resp = pose.register(rgb=rgb0, depth=depth0, mask=mask0, K=cam_K, check_vram=True)
# Init + register in one call using mesh payload
register_mesh_resp = pose.register_with_mesh(
mesh_path="/app/modules/foundation_pose/mesh/my_mesh.stl",
# or mesh_vertices=..., mesh_faces=...
symmetry_tfs=symmetry_tfs, # optional: (N,4,4) or (4,4)
rgb=rgb0,
depth=depth0,
mask=mask0,
K=cam_K,
check_vram=True,
)
# Tracking (optionally limit search area with bbox_xywh)
track_resp = pose.track(rgb=rgb1, depth=depth1, K=cam_K, bbox_xywh=bbox_xywh)
# Recommended reset operation
pose.reset_vram(free_vram=True)
# Backward-compatible alias (deprecated)
pose.reset()
pose.close()
- cam_K: camera intrinsics.
- Large RGB resolution, large
min_n_views, or smallinplane_stepcan cause GPU VRAM errors. check_vram=Trueinregisterperforms a pre-check to prevent server shutdown due to OOM.iterationinregister/trackcan override the server default if provided.register_with_mesh()sends init+register in one operation with mesh payload.- Use
reset_vram()as the default reset API (reset()is kept as backward-compatible deprecated wrapper). reset_object()can optionally receive mesh/model overrides:mesh_pathormesh_vertices+mesh_facesmodel_pts,model_normals,symmetry_tfs,min_n_views,inplane_step
- Default host/port can come from
FPOSE_HOSTandFPOSE_PORTenvironment variables. - Backward-compat alias:
FoundationPoseClient.
Release notes
- 0.1.2: Improved success detection for Segmentation responses (
result/success/status), fixed image prompt registration/usage, addedcheck_vramto PoseEstimationregister. - 0.1.1: Improved resource cleanup in PoseEstimation/Segmentation, use server defaults when iteration is omitted, added pose demo example.
- 0.1.0: Initial public release. Includes FoundationPose RPC client, real-time segmentation client, SSH-based mesh upload CLI/API.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file neuromeka_vfm-0.1.10.tar.gz.
File metadata
- Download URL: neuromeka_vfm-0.1.10.tar.gz
- Upload date:
- Size: 48.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81dddd82e8244ef927adfbf08e2e1cded9964c433fdeb9362c11dc92e3be8a9b
|
|
| MD5 |
336dc7551763cb5d67d225cd11435ad1
|
|
| BLAKE2b-256 |
4b6994096fe982054e06d6f398911eeb67a4cecc6d50c65fc2cfb5e99c2c8761
|
File details
Details for the file neuromeka_vfm-0.1.10-py3-none-any.whl.
File metadata
- Download URL: neuromeka_vfm-0.1.10-py3-none-any.whl
- Upload date:
- Size: 48.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
835091a4ef4fd519bf02fb94448ea451d2c563d481c25bb7e5eb55eacd843581
|
|
| MD5 |
464a9b272251950fb3962b7504c213b0
|
|
| BLAKE2b-256 |
47a4117d029793b0b80ed97fd409d7b72c247ed140cdbd8d0918770642c13710
|