Skip to main content

CLI tool to bulk-download street-level imagery from Mapillary

Project description

Mapillary Bulk Downloader

A CLI tool for downloading street-level imagery from Mapillary at city scale. Define a bounding box, discover every available image inside it, and download them all — with GPS coordinates embedded in EXIF, resumable downloads, and a SQLite-backed discovery cache that makes re-runs instant.

Built to collect training data for 3D city reconstruction (COLMAP + Gaussian Splatting), where you need tens of thousands of geo-tagged street photos covering contiguous areas.

What it does

  1. Discover — Splits a bounding box into a grid, queries every cell in parallel (30 workers), and recursively subdivides cells that hit the API limit. Finds every image Mapillary has in the area.
  2. Cache — Stores all discovered image IDs and coordinates in a local SQLite database (images.db). Subsequent runs skip the API entirely unless you ask to re-discover.
  3. Download — Pulls images at 2048px resolution with progress bars. Embeds GPS lat/lon into JPEG EXIF so each file is self-contained. Tracks what's been downloaded with atomic SQLite writes, so you can interrupt and resume at any time.

Quick start

# Install dependencies
uv sync

# Set your Mapillary API token
echo 'MAPILLARY_CLIENT_TOKEN=MLY|...' > .env

# Interactive mode — pick a city, preview the area on a map, then download
uv run python3 cli.py

# Or go headless
uv run python3 cli.py --city "San Francisco"
uv run python3 cli.py --bbox "-122.52,37.70,-122.35,37.83" --limit 500

Usage

uv run python3 cli.py [OPTIONS]

Options:
  --city NAME           Download from a predefined city
  --bbox W,S,E,N        Custom bounding box (overrides --city)
  --limit N             Cap the number of images to download
  --output-dir PATH     Output directory (default: data/<city>)
  --preview             Open an interactive map in the browser before downloading
  --state STATE         Discovery state when resuming: maintain | merge | rediscover
  --no-save-discovery   Don't persist discovered IDs to the database
  --list-cities         Show predefined cities and exit

No arguments launches interactive mode: arrow-key city selection, optional map preview via Folium, discovery summary, and a confirmation prompt before downloading.

Discovery states

When an images.db already exists for a city:

State Behavior
maintain Load from DB, skip API calls (default)
merge Re-discover and add any new images to the existing DB
rediscover Wipe the DB and run a full fresh discovery

Architecture

cli.py          — CLI entry point: argparse, interactive prompts, map preview
downloader.py   — MapillaryClient (API) + ImageDownloader (grid split, parallel discovery, download loop)
database.py     — DiscoveryDB: SQLite cache with singleton pattern, tracks discovered/downloaded state
config.py       — Dataclasses (MapillaryConfig, BoundingBox), env loading, predefined city bounding boxes
scripts/        — Standalone utilities (GPS coordinate enrichment)

Key design decisions

  • Adaptive grid splitting: The API caps results at 2,000 per query. Dense urban areas easily exceed that. The downloader starts with coarse grid cells and recursively subdivides any cell that saturates the limit, down to a minimum cell size. This guarantees complete coverage without manual tuning.
  • SQLite over JSON: Early versions used download_metadata.json. Switched to SQLite for atomic writes (no corruption on Ctrl+C), fast set-membership lookups on 100k+ image IDs, and clean separation of discovery vs. download state.
  • GPS in EXIF: Coordinates are embedded directly into each JPEG at download time. This means images work standalone — no sidecar files, no separate metadata lookup. Precision is normalized to 7 decimal places (~1 cm) so DB and EXIF values match exactly.
  • Disk reconciliation: On resume, the downloader checks what's actually on disk (not just what the DB says) and reconciles the two. Images on disk missing GPS get coordinates embedded; images in the DB but missing from disk get re-queued.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mapillary_dl-0.1.0.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mapillary_dl-0.1.0-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file mapillary_dl-0.1.0.tar.gz.

File metadata

  • Download URL: mapillary_dl-0.1.0.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mapillary_dl-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fb3199e42dbbc1961ed6afba2f001a152322d7ca71e980865e854fd14f0f07f4
MD5 ea128836513c993fe9b923b4e795cdc9
BLAKE2b-256 7b4d5922e77d3135bac318a17c58b72b787c5ba07e63c4eca6b70c4bdb4e568f

See more details on using hashes here.

File details

Details for the file mapillary_dl-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mapillary_dl-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mapillary_dl-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b924dde69f51587420490c1f36317a99a4315c1a01dcfd60cf830320d364d814
MD5 d45a647a1d0404a814fcaa62736b753d
BLAKE2b-256 badb837102aa7524664f3774203eaa1fbe3f926a8a78ca7c80c35e7e41a774a7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page