Extract business data from Google Maps at scale using reverse-engineered internal APIs

These details have not been verified by PyPI

Project links

Project description

Google Maps Business Extractor

Extract every business in any geographic area from Google Maps -- no browser needed.

gmaps-extractor reverse-engineers Google Maps' internal API to collect business data at scale using raw HTTP requests. Point it at a city and a category, and it systematically covers the entire area using grid-based search with automatic deduplication.

100K+ records/week capable with parallel processing and proxy support.

Features

Full area coverage -- Divides any area into a grid of searchable cells. No results missed.
No browser required -- Pure HTTP requests using httpx. No Selenium, no Puppeteer.
Async support -- async_collect_v2() and stream_collect_v2() for non-blocking I/O.
Streaming -- Async generator yields businesses as they are found.
Event system -- Lifecycle callbacks for monitoring collection progress.
Parallel processing -- Configurable worker pool (up to 50 concurrent requests).
Resumable collection -- V2 collector saves checkpoints and auto-resumes.
Enrichment -- Fetch place details (hours, phone, website) and reviews concurrently.
Adaptive rate limiting -- Exponential backoff with jitter. Auto-adjusts to Google's limits.
Smart deduplication -- Deduplicates by both place_id and hex_id.
Auto cookie management -- Builds Google sessions automatically, refreshes on failure.
Structured logging -- Uses Python's logging module. Silent by default, configurable.
Lightweight core -- Only requires httpx. FastAPI server is optional.

Quick Start

from gmaps_extractor import GMapsExtractor

with GMapsExtractor(proxy="http://user:pass@proxy-host:port") as extractor:
    result = extractor.collect_v2("New York, USA", "lawyers", enrich=True)
    print(f"Found {len(result)} businesses")
    for biz in result:
        print(f"  {biz['name']} - {biz.get('phone', 'N/A')}")

Installation

# Core library (recommended)
pip install gmaps-extractor

# With FastAPI server support (for CLI or legacy workflows)
pip install gmaps-extractor[server]

# Development
pip install gmaps-extractor[dev]

From Source

git clone https://github.com/promisingcoder/GoogleMapsCollector.git
cd GoogleMapsCollector
pip install -e ".[dev]"

Requirements

Python 3.9+
A residential/sticky proxy (required -- Google blocks datacenter IPs)

Usage

Sync Collection (Default)

No server process needed. Requests go directly to Google Maps via httpx.

from gmaps_extractor import GMapsExtractor

with GMapsExtractor(proxy="http://user:pass@host:port") as extractor:
    # Basic collection
    result = extractor.collect("London, UK", "dentists")

    # V2 collector with enrichment and reviews
    result = extractor.collect_v2(
        "Paris, France",
        "restaurants",
        enrich=True,
        reviews=True,
        reviews_limit=50,
        workers=30,
    )

    # Access results
    print(result.metadata)      # {"area": "Paris, France", "category": "restaurants", ...}
    print(result.statistics)    # {"total_collected": 1234, ...}
    for biz in result:
        print(biz["name"], biz.get("rating"))

Async Collection

import asyncio
from gmaps_extractor import GMapsExtractor

async def main():
    async with GMapsExtractor(proxy="http://user:pass@host:port") as extractor:
        # Collect all results at once (async)
        result = await extractor.async_collect_v2(
            "Manhattan, NY",
            "lawyers",
            enrich=True,
            reviews=True,
        )
        print(f"Found {len(result)} businesses")

asyncio.run(main())

Streaming Collection

Process businesses as they are found, without waiting for the full collection to finish.

import asyncio
from gmaps_extractor import GMapsExtractor

async def main():
    async with GMapsExtractor(proxy="http://user:pass@host:port") as extractor:
        async for biz in extractor.stream_collect_v2("NYC", "coffee shops"):
            print(f"Found: {biz['name']} at {biz.get('address', 'N/A')}")

asyncio.run(main())

Subdivision Mode

Break large areas into named sub-areas (boroughs, districts, neighborhoods) for better coverage.

with GMapsExtractor(proxy="http://user:pass@host:port") as extractor:
    result = extractor.collect_v2(
        "London, UK",
        "dentists",
        subdivide=True,
        enrich=True,
    )

Event System

Monitor collection progress with lifecycle callbacks.

from gmaps_extractor import GMapsExtractor, EventType, EventEmitter

emitter = EventEmitter()

def on_cell_complete(event):
    print(f"Cell done: +{event.data.get('businesses_found', 0)} businesses")

def on_complete(event):
    total = event.data.get("total_businesses", 0)
    print(f"Collection complete: {total} businesses")

emitter.on(EventType.CELL_COMPLETE, on_cell_complete)
emitter.on(EventType.COLLECTION_COMPLETE, on_complete)

with GMapsExtractor(proxy="http://user:pass@host:port", events=emitter) as extractor:
    result = extractor.collect_v2("NYC", "lawyers")

Or use the convenience shortcuts:

with GMapsExtractor(
    proxy="http://user:pass@host:port",
    on_business_found=lambda e: print(f"Found: {e.data}"),
    on_collection_complete=lambda e: print(f"Done: {e.data}"),
) as extractor:
    result = extractor.collect_v2("NYC", "lawyers")

Logging

The library uses Python's logging module with a NullHandler by default (no output). Set verbose=True (the default) to see progress output, or configure logging manually.

import logging

# Option 1: Use verbose=True (default)
with GMapsExtractor(proxy="...", verbose=True) as extractor:
    result = extractor.collect("NYC", "lawyers")  # Progress printed to stdout

# Option 2: Configure logging manually
logging.getLogger("gmaps_extractor").setLevel(logging.DEBUG)
logging.getLogger("gmaps_extractor").addHandler(logging.StreamHandler())

with GMapsExtractor(proxy="...", verbose=False) as extractor:
    result = extractor.collect("NYC", "lawyers")  # DEBUG-level output

Low-Level Client

Use GMapsClient or AsyncGMapsClient directly for custom workflows.

from gmaps_extractor.client import GMapsClient
from gmaps_extractor.settings import GMapsSettings

settings = GMapsSettings(proxy_url="http://user:pass@host:port")
client = GMapsClient(settings)

# Search
businesses = client.search("lawyers", lat=40.7128, lng=-74.0060)

# Place details
details = client.place_details(hex_id="0x89c259a...:0x25d41...", name="Acme Law")

# Reviews
reviews = client.reviews(hex_id="0x89c259a...:0x25d41...", limit=20)

Configuration

Constructor Parameters

Parameter	Type	Default	Description
`proxy`	`str`	`None`	Proxy URL. Falls back to `GMAPS_PROXY_*` env vars.
`cookies`	`dict`	`None`	Explicit cookie override. Auto-managed if `None`.
`workers`	`int`	`20`	Parallel search workers.
`use_server`	`bool`	`False`	Use legacy FastAPI server (requires `[server]` extra).
`verbose`	`bool`	`True`	Enable progress output via logging.
`events`	`EventEmitter`	auto	Event emitter for lifecycle hooks.
`progress`	`bool/ProgressReporter`	auto	Progress reporter (attached when `verbose=True`).
`on_business_found`	`callable`	`None`	Shortcut callback for `BUSINESS_FOUND` events.
`on_collection_complete`	`callable`	`None`	Shortcut callback for `COLLECTION_COMPLETE` events.
`server_port`	`int`	`8000`	Port for legacy server mode.

Environment Variables

export GMAPS_PROXY_HOST="proxy-host:port"
export GMAPS_PROXY_USER="username"
export GMAPS_PROXY_PASS="password"
export GMAPS_COOKIES='{"NID":"...","SOCS":"..."}'

Config Resolution Order

Constructor arguments (highest priority)
Environment variables
config.py / _config_defaults.py defaults (lowest priority)

Exception Handling

from gmaps_extractor import GMapsExtractor
from gmaps_extractor.exceptions import (
    GMapsExtractorError,
    BoundaryError,
    ConfigurationError,
    RateLimitError,
    AuthenticationError,
    ServerError,
)

try:
    with GMapsExtractor(proxy="http://user:pass@host:port") as extractor:
        result = extractor.collect_v2("New York, USA", "lawyers")
except BoundaryError:
    print("Could not resolve area boundaries via Nominatim")
except RateLimitError:
    print("Rate limit exceeded after all retries")
except AuthenticationError:
    print("Proxy or cookie authentication failed")
except GMapsExtractorError as e:
    print(f"Extraction failed: {e}")

CLI

After installing, these commands are available:

# V2 collector (recommended)
gmaps-collect-v2 "Manhattan, New York" "lawyers" --enrich --reviews -l 100

# V1 collector
gmaps-collect "New York, USA" "lawyers" --subdivide

# Add reviews to existing collection
gmaps-enrich-reviews output/lawyers_in_manhattan.json -l 50

# Start FastAPI server (only needed for CLI usage)
gmaps-server

Note: CLI commands require the FastAPI server to be running (gmaps-server). The library API does not.

Output Format

JSON and CSV files are generated in the output/ directory.

{
  "metadata": {
    "area": "New York, USA",
    "category": "lawyers",
    "boundary": {"name": "New York", "north": 40.91, "south": 40.49, "east": -73.70, "west": -74.25},
    "search_mode": "grid",
    "enrichment": {"details_fetched": true, "reviews_fetched": true, "reviews_limit": 20}
  },
  "statistics": {
    "total_collected": 1234,
    "duplicates_removed": 89,
    "search_time_seconds": 120.5,
    "total_time_seconds": 340.2
  },
  "businesses": [
    {
      "name": "Smith & Associates",
      "address": "123 Broadway, New York, NY 10006",
      "place_id": "ChIJ...",
      "rating": 4.5,
      "review_count": 123,
      "latitude": 40.7128,
      "longitude": -74.0060,
      "phone": "+1 212-555-0123",
      "website": "https://example.com",
      "category": "Lawyer",
      "hours": {"monday": "9:00 AM - 5:00 PM"},
      "reviews_data": [{"author": "John", "rating": 5, "text": "Excellent!", "date": "2 months ago"}]
    }
  ]
}

Architecture

gmaps_extractor/
├── extractor.py          # GMapsExtractor (high-level API) + CollectionResult
├── client.py             # GMapsClient (sync HTTP, default path)
├── async_client.py       # AsyncGMapsClient (async HTTP)
├── settings.py           # GMapsSettings dataclass
├── events.py             # EventEmitter + EventType
├── progress.py           # ProgressReporter
├── exceptions.py         # Exception hierarchy
├── parsers/              # Response parsers (business, place, reviews)
├── geo/                  # Grid generation, Nominatim boundary resolution
├── extraction/           # Collection orchestrators (sync, async, streaming)
├── decoder/              # Protobuf parameter decoder
└── server.py             # Optional FastAPI server

Contributing

See CLAUDE.md for architecture details, common tasks, and development commands.

git clone https://github.com/promisingcoder/GoogleMapsCollector.git
cd GoogleMapsCollector
pip install -e ".[dev]"
pytest

License

MIT License -- See LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.0.0

Feb 9, 2026

1.0.0

Feb 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gmaps_extractor-2.0.0.tar.gz (97.1 kB view details)

Uploaded Feb 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gmaps_extractor-2.0.0-py3-none-any.whl (110.0 kB view details)

Uploaded Feb 9, 2026 Python 3

File details

Details for the file gmaps_extractor-2.0.0.tar.gz.

File metadata

Download URL: gmaps_extractor-2.0.0.tar.gz
Upload date: Feb 9, 2026
Size: 97.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for gmaps_extractor-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`f3c460ea32d9730ba1236733d2b1d7589cd8aa4270edb590e242467298d90db2`
MD5	`056cdeadc0d511fe9bf02aa562d7f771`
BLAKE2b-256	`43829e3a071498effeb63d9c0f387ce1c8d567fda201c00071c05876cc5457dd`

See more details on using hashes here.

File details

Details for the file gmaps_extractor-2.0.0-py3-none-any.whl.

File metadata

Download URL: gmaps_extractor-2.0.0-py3-none-any.whl
Upload date: Feb 9, 2026
Size: 110.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for gmaps_extractor-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0e5f74481d1215c5897fa6db4e7570d77b32c031533c3bbe8c7d9a675a2c2835`
MD5	`3aa982f996c8ea561f8472b130a73c1e`
BLAKE2b-256	`f009b7edd37d68f7964c479f5cdebb23183d7194375694a7bca1b70786c57278`

See more details on using hashes here.

gmaps-extractor 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Google Maps Business Extractor

Features

Quick Start

Installation

From Source

Requirements

Usage

Sync Collection (Default)

Async Collection

Streaming Collection

Subdivision Mode

Event System

Logging

Low-Level Client

Configuration

Constructor Parameters

Environment Variables

Config Resolution Order

Exception Handling

CLI

Output Format

Architecture

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes