Extract business data from Google Maps at scale using reverse-engineered internal APIs
Project description
Google Maps Business Extractor
Extract every business in any geographic area from Google Maps -- no browser needed.
gmaps-extractor reverse-engineers Google Maps' internal API to collect business data at scale using raw HTTP requests. Point it at a city and a category, and it systematically covers the entire area using grid-based search with automatic deduplication.
100K+ records/week capable with parallel processing and proxy support.
Features
- Full area coverage -- Divides any area into a grid of searchable cells. No results missed.
- No browser required -- Pure HTTP requests using httpx. No Selenium, no Puppeteer.
- Async support --
async_collect_v2()andstream_collect_v2()for non-blocking I/O. - Streaming -- Async generator yields businesses as they are found.
- Event system -- Lifecycle callbacks for monitoring collection progress.
- Parallel processing -- Configurable worker pool (up to 50 concurrent requests).
- Resumable collection -- V2 collector saves checkpoints and auto-resumes.
- Enrichment -- Fetch place details (hours, phone, website) and reviews concurrently.
- Adaptive rate limiting -- Exponential backoff with jitter. Auto-adjusts to Google's limits.
- Smart deduplication -- Deduplicates by both
place_idandhex_id. - Auto cookie management -- Builds Google sessions automatically, refreshes on failure.
- Structured logging -- Uses Python's
loggingmodule. Silent by default, configurable. - Lightweight core -- Only requires
httpx. FastAPI server is optional.
Quick Start
from gmaps_extractor import GMapsExtractor
with GMapsExtractor(proxy="http://user:pass@proxy-host:port") as extractor:
result = extractor.collect_v2("New York, USA", "lawyers", enrich=True)
print(f"Found {len(result)} businesses")
for biz in result:
print(f" {biz['name']} - {biz.get('phone', 'N/A')}")
Installation
# Core library (recommended)
pip install gmaps-extractor
# With FastAPI server support (for CLI or legacy workflows)
pip install gmaps-extractor[server]
# Development
pip install gmaps-extractor[dev]
From Source
git clone https://github.com/promisingcoder/GoogleMapsCollector.git
cd GoogleMapsCollector
pip install -e ".[dev]"
Requirements
- Python 3.9+
- A residential/sticky proxy (required -- Google blocks datacenter IPs)
Usage
Sync Collection (Default)
No server process needed. Requests go directly to Google Maps via httpx.
from gmaps_extractor import GMapsExtractor
with GMapsExtractor(proxy="http://user:pass@host:port") as extractor:
# Basic collection
result = extractor.collect("London, UK", "dentists")
# V2 collector with enrichment and reviews
result = extractor.collect_v2(
"Paris, France",
"restaurants",
enrich=True,
reviews=True,
reviews_limit=50,
workers=30,
)
# Access results
print(result.metadata) # {"area": "Paris, France", "category": "restaurants", ...}
print(result.statistics) # {"total_collected": 1234, ...}
for biz in result:
print(biz["name"], biz.get("rating"))
Async Collection
import asyncio
from gmaps_extractor import GMapsExtractor
async def main():
async with GMapsExtractor(proxy="http://user:pass@host:port") as extractor:
# Collect all results at once (async)
result = await extractor.async_collect_v2(
"Manhattan, NY",
"lawyers",
enrich=True,
reviews=True,
)
print(f"Found {len(result)} businesses")
asyncio.run(main())
Streaming Collection
Process businesses as they are found, without waiting for the full collection to finish.
import asyncio
from gmaps_extractor import GMapsExtractor
async def main():
async with GMapsExtractor(proxy="http://user:pass@host:port") as extractor:
async for biz in extractor.stream_collect_v2("NYC", "coffee shops"):
print(f"Found: {biz['name']} at {biz.get('address', 'N/A')}")
asyncio.run(main())
Subdivision Mode
Break large areas into named sub-areas (boroughs, districts, neighborhoods) for better coverage.
with GMapsExtractor(proxy="http://user:pass@host:port") as extractor:
result = extractor.collect_v2(
"London, UK",
"dentists",
subdivide=True,
enrich=True,
)
Event System
Monitor collection progress with lifecycle callbacks.
from gmaps_extractor import GMapsExtractor, EventType, EventEmitter
emitter = EventEmitter()
def on_cell_complete(event):
print(f"Cell done: +{event.data.get('businesses_found', 0)} businesses")
def on_complete(event):
total = event.data.get("total_businesses", 0)
print(f"Collection complete: {total} businesses")
emitter.on(EventType.CELL_COMPLETE, on_cell_complete)
emitter.on(EventType.COLLECTION_COMPLETE, on_complete)
with GMapsExtractor(proxy="http://user:pass@host:port", events=emitter) as extractor:
result = extractor.collect_v2("NYC", "lawyers")
Or use the convenience shortcuts:
with GMapsExtractor(
proxy="http://user:pass@host:port",
on_business_found=lambda e: print(f"Found: {e.data}"),
on_collection_complete=lambda e: print(f"Done: {e.data}"),
) as extractor:
result = extractor.collect_v2("NYC", "lawyers")
Logging
The library uses Python's logging module with a NullHandler by default (no output). Set verbose=True (the default) to see progress output, or configure logging manually.
import logging
# Option 1: Use verbose=True (default)
with GMapsExtractor(proxy="...", verbose=True) as extractor:
result = extractor.collect("NYC", "lawyers") # Progress printed to stdout
# Option 2: Configure logging manually
logging.getLogger("gmaps_extractor").setLevel(logging.DEBUG)
logging.getLogger("gmaps_extractor").addHandler(logging.StreamHandler())
with GMapsExtractor(proxy="...", verbose=False) as extractor:
result = extractor.collect("NYC", "lawyers") # DEBUG-level output
Low-Level Client
Use GMapsClient or AsyncGMapsClient directly for custom workflows.
from gmaps_extractor.client import GMapsClient
from gmaps_extractor.settings import GMapsSettings
settings = GMapsSettings(proxy_url="http://user:pass@host:port")
client = GMapsClient(settings)
# Search
businesses = client.search("lawyers", lat=40.7128, lng=-74.0060)
# Place details
details = client.place_details(hex_id="0x89c259a...:0x25d41...", name="Acme Law")
# Reviews
reviews = client.reviews(hex_id="0x89c259a...:0x25d41...", limit=20)
Configuration
Constructor Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
proxy |
str |
None |
Proxy URL. Falls back to GMAPS_PROXY_* env vars. |
cookies |
dict |
None |
Explicit cookie override. Auto-managed if None. |
workers |
int |
20 |
Parallel search workers. |
use_server |
bool |
False |
Use legacy FastAPI server (requires [server] extra). |
verbose |
bool |
True |
Enable progress output via logging. |
events |
EventEmitter |
auto | Event emitter for lifecycle hooks. |
progress |
bool/ProgressReporter |
auto | Progress reporter (attached when verbose=True). |
on_business_found |
callable |
None |
Shortcut callback for BUSINESS_FOUND events. |
on_collection_complete |
callable |
None |
Shortcut callback for COLLECTION_COMPLETE events. |
server_port |
int |
8000 |
Port for legacy server mode. |
Environment Variables
export GMAPS_PROXY_HOST="proxy-host:port"
export GMAPS_PROXY_USER="username"
export GMAPS_PROXY_PASS="password"
export GMAPS_COOKIES='{"NID":"...","SOCS":"..."}'
Config Resolution Order
- Constructor arguments (highest priority)
- Environment variables
config.py/_config_defaults.pydefaults (lowest priority)
Exception Handling
from gmaps_extractor import GMapsExtractor
from gmaps_extractor.exceptions import (
GMapsExtractorError,
BoundaryError,
ConfigurationError,
RateLimitError,
AuthenticationError,
ServerError,
)
try:
with GMapsExtractor(proxy="http://user:pass@host:port") as extractor:
result = extractor.collect_v2("New York, USA", "lawyers")
except BoundaryError:
print("Could not resolve area boundaries via Nominatim")
except RateLimitError:
print("Rate limit exceeded after all retries")
except AuthenticationError:
print("Proxy or cookie authentication failed")
except GMapsExtractorError as e:
print(f"Extraction failed: {e}")
CLI
After installing, these commands are available:
# V2 collector (recommended)
gmaps-collect-v2 "Manhattan, New York" "lawyers" --enrich --reviews -l 100
# V1 collector
gmaps-collect "New York, USA" "lawyers" --subdivide
# Add reviews to existing collection
gmaps-enrich-reviews output/lawyers_in_manhattan.json -l 50
# Start FastAPI server (only needed for CLI usage)
gmaps-server
Note: CLI commands require the FastAPI server to be running (gmaps-server). The library API does not.
Output Format
JSON and CSV files are generated in the output/ directory.
{
"metadata": {
"area": "New York, USA",
"category": "lawyers",
"boundary": {"name": "New York", "north": 40.91, "south": 40.49, "east": -73.70, "west": -74.25},
"search_mode": "grid",
"enrichment": {"details_fetched": true, "reviews_fetched": true, "reviews_limit": 20}
},
"statistics": {
"total_collected": 1234,
"duplicates_removed": 89,
"search_time_seconds": 120.5,
"total_time_seconds": 340.2
},
"businesses": [
{
"name": "Smith & Associates",
"address": "123 Broadway, New York, NY 10006",
"place_id": "ChIJ...",
"rating": 4.5,
"review_count": 123,
"latitude": 40.7128,
"longitude": -74.0060,
"phone": "+1 212-555-0123",
"website": "https://example.com",
"category": "Lawyer",
"hours": {"monday": "9:00 AM - 5:00 PM"},
"reviews_data": [{"author": "John", "rating": 5, "text": "Excellent!", "date": "2 months ago"}]
}
]
}
Architecture
gmaps_extractor/
├── extractor.py # GMapsExtractor (high-level API) + CollectionResult
├── client.py # GMapsClient (sync HTTP, default path)
├── async_client.py # AsyncGMapsClient (async HTTP)
├── settings.py # GMapsSettings dataclass
├── events.py # EventEmitter + EventType
├── progress.py # ProgressReporter
├── exceptions.py # Exception hierarchy
├── parsers/ # Response parsers (business, place, reviews)
├── geo/ # Grid generation, Nominatim boundary resolution
├── extraction/ # Collection orchestrators (sync, async, streaming)
├── decoder/ # Protobuf parameter decoder
└── server.py # Optional FastAPI server
Contributing
See CLAUDE.md for architecture details, common tasks, and development commands.
git clone https://github.com/promisingcoder/GoogleMapsCollector.git
cd GoogleMapsCollector
pip install -e ".[dev]"
pytest
License
MIT License -- See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gmaps_extractor-2.0.0.tar.gz.
File metadata
- Download URL: gmaps_extractor-2.0.0.tar.gz
- Upload date:
- Size: 97.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f3c460ea32d9730ba1236733d2b1d7589cd8aa4270edb590e242467298d90db2
|
|
| MD5 |
056cdeadc0d511fe9bf02aa562d7f771
|
|
| BLAKE2b-256 |
43829e3a071498effeb63d9c0f387ce1c8d567fda201c00071c05876cc5457dd
|
File details
Details for the file gmaps_extractor-2.0.0-py3-none-any.whl.
File metadata
- Download URL: gmaps_extractor-2.0.0-py3-none-any.whl
- Upload date:
- Size: 110.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e5f74481d1215c5897fa6db4e7570d77b32c031533c3bbe8c7d9a675a2c2835
|
|
| MD5 |
3aa982f996c8ea561f8472b130a73c1e
|
|
| BLAKE2b-256 |
f009b7edd37d68f7964c479f5cdebb23183d7194375694a7bca1b70786c57278
|