Workflow of reproducible multimodal inference for urban environment evaluation.

These details have not been verified by PyPI

Project links

Project description

Urban-WORM

Introduction

Urban-WORM (Workflow Of Reproducible Multimodal Inference) is a user-friendly high-level interface designed for building geo-referenced urban datasets with model-generated ground-truth labels. It covers the full pipeline — from collecting crowdsourced street views, photos, and sounds near building footprints, through batched VLM inference, to an organized export of labeled metadata.

Free software: MIT license
Website/Documentation: https://billbillbilly.github.io/urbanworm/

Features

Data collection

Collect geotagged street views (Mapillary/Google), photos (Flickr), and audio (Freesound/Radio Aporee) within the proximity of building footprints or other POIs
Calibrate panorama orientation to face a given location; auto-compute field-of-view from building footprints
Filter personal photos with face detection; slice audio recordings into fixed-duration clips
Crash-safe checkpointing — pass checkpoint_path to any collection method; already-fetched locations are skipped on resume, so a failed run never starts from zero

Inference / ground-truth labeling

Define a structured output schema once; all backends share the same one_inference / batch_inference interface
Unsloth (recommended) — GPU-accelerated local VLM with optional GPU batching; 2–4× faster than Ollama; automatically spreads the model across all visible GPUs when more than one is present, with OOM-safe chunk retry so failed batches fall back to item-by-item instead of producing silent stub outputs
Ollama — lightweight local inference, no GPU required
llama.cpp — highly customizable sampling; supports audio input
Cloud APIs — Claude (Anthropic), GPT-4o (OpenAI), Gemini (Google) via InferenceAPI
Crash-safe checkpointing on all batch_inference methods — resume mid-run without reprocessing completed images

Note: models can make mistakes and results still need to be reviewed and used carefully.

Export

GeoTaggedData.export() — one call produces a metadata.csv paired with an organized images/ or audio/ folder, with optional label columns merged in

Installation

Step 1 — Core package

pip install urban-worm

Step 2 — Choose your inference backend

Unsloth is the recommended backend for local inference (GPU-accelerated, fastest).

Unsloth — recommended (GPU required)

GPU-specific torch must be installed before the unsloth extra, otherwise pip falls back to a slow CPU-only build:

# CUDA (most modern NVIDIA GPUs):
pip install torch --index-url https://download.pytorch.org/whl/cu124

# macOS Apple Silicon (MPS):
pip install torch          # MPS is enabled by default on macOS

Then install the extra:

pip install "urban-worm[unsloth]"

Tested checkpoints: unsloth/Qwen3-VL-3B-Instruct, unsloth/Qwen3-VL-8B-Instruct, unsloth/gemma-3-4b-it, unsloth/Qwen2-VL-2B-Instruct, unsloth/Qwen2.5-VL-7B-Instruct-bnb-4bit. Any vision model that unsloth.FastVisionModel can load should work.

Ollama — lightweight local inference (no GPU required)

Install the Ollama application first:

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# macOS
brew install ollama

# Windows — download the installer from https://ollama.com/

Then install the Python client:

pip install "urban-worm[ollama]"

llama.cpp — CLI-based local inference

The llama-mtmd-cli binary must be installed separately:

# macOS / Linux
brew install llama.cpp

# Windows
winget install llama.cpp

More options: llama.cpp install guide. GGUF model collections: ggml-org multimodal GGUFs.

The Python binding is installed via the extra:

# CPU build (no compile flags needed):
pip install "urban-worm[llamacpp]"

# CUDA build:
CMAKE_ARGS="-DGGML_CUDA=on" pip install "urban-worm[llamacpp]"

# Metal build (macOS):
CMAKE_ARGS="-DGGML_METAL=on" pip install "urban-worm[llamacpp]"

Cloud APIs (Claude / GPT-4o / Gemini)

pip install "urban-worm[api]"

Audio support (optional)

Only needed if you use get_sound_from_location():

pip install "urban-worm[audio]"

All extras at once

Note: GPU torch must still be pre-installed before running pip install "urban-worm[all]". See the Unsloth section above.

pip install "urban-worm[all]"          # all backends + API providers (no audio)
pip install "urban-worm[all,audio]"    # + audio slicing

Dev install from source

pip install -e git+https://github.com/billbillbilly/urbanworm.git#egg=urban-worm
pip install "urban-worm[dev]"

Usage

Collect street views with crash-safe checkpointing

from urbanworm import GeoTaggedData

gtd = GeoTaggedData()
gtd.getBuildings(bbox=(-83.208, 42.374, -83.206, 42.375), source='osm')

# Step 1 — fetch metadata from Mapillary (resumes from svi.jsonl if interrupted)
gtd.get_svi_from_locations(
    key="YOUR_MAPILLARY_KEY",
    distance=30,
    reoriented=True,
    checkpoint_path="run/svi.jsonl",
)

# Step 2 — download images to disk (resume-safe: existing files are never overwritten)
gtd.download_to_dir(data='svi', to_dir='run/images')

Inference with a local VLM (Unsloth — recommended)

from urbanworm import InferenceUnsloth
from typing import Literal

schema = {
    "occupancy": (Literal["occupied", "unoccupied", "uncertain"], ...),
    "visual_evidence": (str, ...),
}

infer = InferenceUnsloth(
    llm="unsloth/Qwen3-VL-3B-Instruct",
    load_in_4bit=True,
    geo_tagged_data=gtd,
    schema=schema,
    # device and max_memory are optional — defaults shown below:
    # device=None        → auto: "auto" when multiple GPUs are detected,
    #                      "cuda:0" for a single GPU, "cpu" otherwise
    # max_memory=None    → auto: 90 % of each GPU's total VRAM, e.g.
    #                      {0: "10GiB", 1: "10GiB"} for two 12 GB GPUs
)

df = infer.batch_inference(
    system="You are an urban researcher assessing housing conditions.",
    prompt="Is this house occupied or vacant? Describe the visual evidence.",
    batch_size=4,             # batch > 1 trades VRAM for throughput
    max_new_tokens=256,
    checkpoint_path="run/labels.jsonl",   # resume-safe
)

Multi-GPU note — when multiple CUDA GPUs are present, InferenceUnsloth automatically sets device_map="auto" and splits the model layers across all of them. You can override the per-GPU memory budget with max_memory, for example max_memory={0: "10GiB", 1: "10GiB"} to leave 2 GB headroom on each of two 12 GB cards. If a batch triggers an out-of-memory error at runtime, the failed chunk is automatically retried one item at a time after clearing the CUDA cache, so you lose at most one image rather than the entire batch.

Inference with a cloud API

from urbanworm import InferenceAPI

infer = InferenceAPI(
    llm="claude-sonnet-4-5",   # or "gpt-4o", "gemini-2.0-flash"
    provider="anthropic",       # or "openai", "google"
    api_key="YOUR_API_KEY",
    geo_tagged_data=gtd,
    schema=schema,
)

df = infer.batch_inference(
    system="You are an urban researcher assessing housing conditions.",
    prompt="Is this house occupied or vacant? Describe the visual evidence.",
    checkpoint_path="run/labels_claude.jsonl",
)

Export to an organized dataset

# Produces dataset/metadata.csv + dataset/images/
csv_path = gtd.export(output_dir="dataset", data="svi", labels=df)

More examples: docs/1_basic_inference.ipynb, docs/3_ground_truth_labeling.ipynb.

To do

v0.1.x:

A module for collecting social media data (Flickr and Freesound)
A method for inferencing sound recordings

v0.2.x:

Crash-safe checkpointing for collection and inference
Cloud API inference backend (Claude / GPT-4o / Gemini)
export() — organized dataset export with metadata CSV
Full ground-truth labeling tutorial notebook
A web UI providing interactive operation and data visualization

Legal Notice

This repository and its content are provided for educational and research purposes only. By using the information and code provided, users acknowledge that they are using the APIs and models at their own risk and agree to comply with any applicable laws and regulations.

Acknowledgements

The inference backends are built on:

The GIS data sourcing, image processing, and data collection functionality is built on:

The development of this package is supported and inspired by the city of Detroit.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.3

May 18, 2026

0.2.2

May 17, 2026

0.2.1

May 16, 2026

0.2.0

May 11, 2026

0.1.9

Mar 9, 2026

0.1.8

Mar 8, 2026

0.1.7

Mar 8, 2026

0.1.6

Mar 6, 2026

0.1.5

Mar 6, 2026

0.1.4

Feb 28, 2026

0.1.3

Feb 27, 2026

0.1.2

Jan 24, 2026

0.1.1

Jan 18, 2026

0.1.0

Jan 17, 2026

0.0.23

Aug 31, 2025

0.0.22

Aug 17, 2025

0.0.21

Aug 15, 2025

0.0.20

Aug 15, 2025

0.0.19

Aug 9, 2025

0.0.18

Jun 20, 2025

0.0.17

Apr 26, 2025

0.0.16

Apr 26, 2025

0.0.15

Apr 21, 2025

0.0.14

Apr 17, 2025

0.0.13

Apr 14, 2025

0.0.12

Apr 12, 2025

0.0.11

Apr 11, 2025

0.0.10

Apr 9, 2025

0.0.9

Apr 6, 2025

0.0.8

Apr 3, 2025

0.0.7

Mar 31, 2025

0.0.6

Mar 27, 2025

0.0.5

Mar 25, 2025

0.0.4

Mar 23, 2025

0.0.3

Mar 23, 2025

0.0.2

Mar 19, 2025

0.0.1

Mar 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

urban_worm-0.2.3.tar.gz (310.8 kB view details)

Uploaded May 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

urban_worm-0.2.3-py3-none-any.whl (296.0 kB view details)

Uploaded May 18, 2026 Python 3

File details

Details for the file urban_worm-0.2.3.tar.gz.

File metadata

Download URL: urban_worm-0.2.3.tar.gz
Upload date: May 18, 2026
Size: 310.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for urban_worm-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`cddaff0b84813f50fbe20e70f2c353707b0a8d20f12bd8bb277e6bb5a1233093`
MD5	`9ac2d45ff6782247772f65dcd2ec2a0e`
BLAKE2b-256	`572ab3b43f5889dd85a8044f5eac755ca47fbcb369eb1ecc30844df30ac8b134`

See more details on using hashes here.

File details

Details for the file urban_worm-0.2.3-py3-none-any.whl.

File metadata

Download URL: urban_worm-0.2.3-py3-none-any.whl
Upload date: May 18, 2026
Size: 296.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for urban_worm-0.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e27505227b81c5eaa6b07e6a81d9a0b24dfc455e4d9c342bc975a652eb788a36`
MD5	`baf7b4553ef05514c6a84ebac4cdd946`
BLAKE2b-256	`88b26e0ae12eeeaa07c229a2f06859cea9701ba1aa3a2adfeada0f8db1c53882`

See more details on using hashes here.

urban-worm 0.2.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Urban-WORM

Introduction

Features

Installation

Step 1 — Core package

Step 2 — Choose your inference backend

Unsloth — recommended (GPU required)

Ollama — lightweight local inference (no GPU required)

llama.cpp — CLI-based local inference

Cloud APIs (Claude / GPT-4o / Gemini)

Audio support (optional)

All extras at once

Dev install from source

Usage

Collect street views with crash-safe checkpointing

Inference with a local VLM (Unsloth — recommended)

Inference with a cloud API

Export to an organized dataset

To do

Legal Notice

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes