Extract business data from Google Maps at scale

These details have not been verified by PyPI

Project links

Project description

Google Maps Business Extractor

Extract every business in any geographic area from Google Maps — no browser needed.

This tool reverse-engineers Google Maps' internal API (protobuf-encoded search endpoints) to collect business data at scale using raw HTTP requests. Point it at a city and a category, and it systematically covers the entire area using a grid-based search with automatic geographic subdivision via OpenStreetMap Nominatim.

100K+ records/week capable with parallel processing and proxy support.

Features

Full area coverage — Automatically divides any city, region, or country into a grid of searchable cells. No results missed.
Subdivision mode — Breaks large areas into named sub-areas (boroughs, districts, neighborhoods) for even better coverage.
No browser required — Pure HTTP requests against Google's internal endpoints. No Selenium, no Puppeteer, no headless Chrome.
Parallel processing — Configurable worker pool (up to 50 concurrent requests) for fast extraction.
Resumable collection — V2 collector saves checkpoints. If it crashes, run again and it picks up where it left off.
Parallel enrichment — Fetch place details (hours, phone, website) and reviews concurrently, not one-by-one.
Adaptive rate limiting — Exponential backoff with jitter. Automatically slows down on errors and speeds up on success.
Dual output — JSON and CSV generated simultaneously. JSONL streaming for large datasets.
Smart deduplication — Deduplicates by both place_id and hex_id across overlapping grid cells.
Auto cookie management — Builds Google sessions automatically by visiting google.com -> consent.google.com -> maps.google.com to obtain required cookies.
Boundary filtering — Removes results that fall outside the target area with configurable buffer distance.
Reviews with pagination — Fetches up to hundreds of reviews per business using Google's listugcposts endpoint.
Pip-installable — Install from PyPI or source. Use as a Python library or from the command line.

Installation

From PyPI

pip install gmaps-extractor

From Source

git clone https://github.com/promisingcoder/google_maps_business_extractor.git
cd google_maps_business_extractor

pip install -e .

Requirements

Python 3.9+
A residential/sticky proxy (required — Google blocks datacenter IPs)

Quick Start

Python Library (Recommended)

The GMapsExtractor class is the main entry point for library usage. It automatically starts the internal API server in the background — no separate server process needed.

from gmaps_extractor import GMapsExtractor

with GMapsExtractor(proxy="http://user:pass@host:port") as extractor:
    result = extractor.collect("New York, USA", "lawyers", enrich=True)
    print(f"Found {len(result)} businesses")
    for biz in result:
        print(biz["name"], biz["address"])

See the Python Library API section below for full details.

Command Line

# Start the API server (required for CLI usage)
gmaps-server
# Or: python run_server.py

# Basic collection
gmaps-collect "New York, USA" "lawyers"

# Enhanced collector (V2) with reviews
gmaps-collect-v2 "Paris, France" "restaurants" --enrich --reviews -l 50

See the CLI Reference section below for all available flags.

Python Library API

GMapsExtractor

The GMapsExtractor class manages server lifecycle and configuration. Use it as a context manager for clean startup and shutdown.

from gmaps_extractor import GMapsExtractor

# Proxy via constructor argument
with GMapsExtractor(proxy="http://user:pass@host:port") as extractor:
    result = extractor.collect("New York, USA", "lawyers", enrich=True)

# Proxy via environment variables (GMAPS_PROXY_HOST, GMAPS_PROXY_USER, GMAPS_PROXY_PASS)
with GMapsExtractor() as extractor:
    result = extractor.collect("London, UK", "dentists", subdivide=True)

Constructor Parameters

Parameter	Type	Default	Description
`proxy`	`str`	`None`	Proxy URL (e.g., `"http://user:pass@host:port"`). Falls back to `GMAPS_PROXY_*` env vars.
`cookies`	`dict`	`None`	Explicit cookie override. If `None`, cookies are handled automatically.
`workers`	`int`	`20`	Default number of parallel search workers.
`server_port`	`int`	`8000`	Port for the internal API server.
`auto_start_server`	`bool`	`True`	Whether to auto-start the API server in the background.
`verbose`	`bool`	`True`	Whether to print progress output.

collect() — V1 Collector

result = extractor.collect(
    "New York, USA",          # area (required)
    "lawyers",                # category (required)
    enrich=True,              # fetch place details (hours, phone, website)
    reviews=True,             # fetch reviews
    reviews_limit=20,         # max reviews per business
    workers=30,               # parallel search workers
    subdivide=True,           # use subdivision mode
    buffer_km=5.0,            # boundary filter buffer in km
    output_file="out.json",   # save JSON to file (None = auto-generate)
    output_csv="out.csv",     # save CSV to file (False = disable CSV)
    verbose=False,            # suppress progress output
)

collect_v2() — Enhanced Collector (Recommended for Large Jobs)

result = extractor.collect_v2(
    "Paris, France",          # area (required)
    "restaurants",            # category (required)
    enrich=True,              # fetch place details
    reviews=True,             # fetch reviews
    reviews_limit=50,         # max reviews per business
    workers=30,               # parallel search workers
    enrichment_workers=10,    # parallel enrichment workers
    checkpoint_interval=100,  # save checkpoint every N businesses
    resume=True,              # resume from checkpoint if available
    subdivide=True,           # use subdivision mode
    buffer_km=5.0,            # boundary filter buffer in km
    output_file="out.json",   # save JSON to file
    output_csv="out.csv",     # save CSV to file
)

CollectionResult

Both collect() and collect_v2() return a CollectionResult object that supports iteration, indexing, and length.

result = extractor.collect("New York, USA", "lawyers")

# Length
print(f"Found {len(result)} businesses")

# Iteration
for biz in result:
    print(biz["name"], biz["rating"])

# Indexing
first = result[0]
last_five = result[-5:]

# Access structured data
print(result.metadata)     # {"area": "New York, USA", "category": "lawyers", ...}
print(result.statistics)   # {"total_collected": 1234, "duplicates_removed": 89, ...}
print(result.businesses)   # [{"name": "...", "address": "...", ...}, ...]

# Full dict (matches the JSON output structure)
data = result.to_dict()    # {"metadata": {...}, "statistics": {...}, "businesses": [...]}

Exception Handling

All library exceptions inherit from GMapsExtractorError, so you can catch them broadly or handle specific cases.

from gmaps_extractor import GMapsExtractor
from gmaps_extractor.exceptions import (
    GMapsExtractorError,   # base exception for all errors
    ServerError,           # API server failed to start or is unreachable
    BoundaryError,         # area boundaries could not be resolved via Nominatim
    ConfigurationError,    # invalid or incomplete configuration
    RateLimitError,        # rate-limiting exceeded retry capacity
    AuthenticationError,   # proxy or cookie authentication failed
)

try:
    with GMapsExtractor(proxy="http://user:pass@host:port") as extractor:
        result = extractor.collect("New York, USA", "lawyers")
except ServerError:
    print("Could not start the API server")
except BoundaryError:
    print("Could not resolve area boundaries")
except GMapsExtractorError as e:
    print(f"Extraction failed: {e}")

Low-Level Functions

The lower-level collect_businesses() and collect_businesses_v2() functions are still available for advanced use. These require the API server to be running separately (via gmaps-server or python run_server.py).

from gmaps_extractor import collect_businesses, collect_businesses_v2

# Requires server running on localhost:8000
businesses = collect_businesses("New York, USA", "lawyers", enrich=True)

Important Notes

Proxy is required for production use. Pass via the proxy constructor argument or set GMAPS_PROXY_HOST, GMAPS_PROXY_USER, and GMAPS_PROXY_PASS environment variables.
Cookies are handled automatically. The system auto-fetches cookies from Google. You only need to provide them explicitly if the automatic flow fails.
One instance at a time. Only one GMapsExtractor instance should be active at a time, since configuration is applied to shared module-level globals.
Use the context manager. The with statement ensures the background server shuts down cleanly. Without it, call extractor.shutdown() manually when done.

Console Scripts

After installing with pip install gmaps-extractor, the following commands are available globally:

Command	Equivalent Script	Description
`gmaps-collect`	`python collect.py`	V1 collector
`gmaps-collect-v2`	`python collect_v2.py`	V2 enhanced collector (recommended)
`gmaps-enrich-reviews`	`python enrich_reviews_only.py`	Add reviews to existing collection
`gmaps-server`	`python run_server.py`	Start the API server

All flags are identical to their script equivalents:

# These are equivalent
gmaps-collect-v2 "Manhattan, New York" "lawyers" --enrich --reviews -l 100
python collect_v2.py "Manhattan, New York" "lawyers" --enrich --reviews -l 100

CLI Reference

collect.py / gmaps-collect

Flag	Default	Description
`area`	required	Area to search (e.g., `"New York, USA"`)
`category`	required	Business category (e.g., `"lawyers"`)
`--enrich`	off	Fetch detailed place info (hours, phone, website, photos)
`--reviews`	off	Fetch reviews for each business
`--reviews-limit N`	5	Max reviews per business
`-p, --parallel N`	20	Number of parallel search workers (max 50)
`--subdivide`	off	Use named sub-areas for better coverage
`-b, --buffer N`	5.0	Boundary filter buffer in km
`-o, --output PATH`	auto	JSON output file path
`--csv PATH`	auto	CSV output file path
`--no-csv`	off	Disable CSV output
`-q, --quiet`	off	Suppress progress output

collect_v2.py / gmaps-collect-v2 (Enhanced)

All flags from collect.py plus:

Flag	Default	Description
`-w, --workers N`	20	Parallel workers for cell queries
`--enrich-workers N`	5	Parallel workers for enrichment
`-c, --checkpoint N`	100	Save checkpoint every N businesses
`--resume`	on	Resume from checkpoint if available
`--no-resume`	off	Start fresh, ignore existing checkpoint

CLI Quick Examples

# Start the server (required for CLI usage only — library API auto-starts it)
gmaps-server

# Basic collection
gmaps-collect "New York, USA" "lawyers"

# With place details and reviews
gmaps-collect "Paris, France" "restaurants" --enrich --reviews --reviews-limit 20

# Subdivision mode for large areas
gmaps-collect "London, UK" "dentists" --subdivide

# V2 with parallel enrichment and resumability
gmaps-collect-v2 "Manhattan, New York" "lawyers" --enrich --reviews -l 100

# Resume an interrupted V2 collection
gmaps-collect-v2 "Manhattan, New York" "lawyers" --resume

# Add reviews to an existing collection
gmaps-enrich-reviews output/lawyers_in_manhattan.json -l 50

# Full control
gmaps-collect-v2 "Los Angeles, CA" "restaurants" \
  --enrich --reviews -l 50 \
  --workers 30 --enrich-workers 10 \
  --checkpoint 100 --subdivide

Configuration

Option 1: Constructor Arguments (Library Only)

with GMapsExtractor(
    proxy="http://user:pass@host:port",
    workers=30,
    server_port=9000,
    verbose=False,
) as extractor:
    result = extractor.collect("New York, USA", "lawyers")

Option 2: Environment Variables (Recommended for CLI)

export GMAPS_PROXY_HOST="your-proxy-host:port"
export GMAPS_PROXY_USER="your-username"
export GMAPS_PROXY_PASS="your-password"

# Optional: provide Google cookies as JSON
export GMAPS_COOKIES='{"NID":"...","SOCS":"...","AEC":"..."}'

Option 3: Config File

Edit gmaps_extractor/config.py (copied from config.example.py):

_DIRECT_PROXY_HOST = "your-proxy-host:port"
_DIRECT_PROXY_USER = "username"
_DIRECT_PROXY_PASS = "password_country-us_session-XXX_lifetime-30m_streaming-1"

Note: When using the library API (GMapsExtractor), constructor arguments take highest priority, followed by environment variables, then config.py defaults. When using the CLI, environment variables and config.py are the configuration sources.

Proxy Requirements

Sticky session proxy with 30+ minute lifetime recommended
Residential proxies work best (Google blocks datacenter IPs)
The _lifetime-30m parameter in the proxy password configures session stickiness (provider-specific)

Cookie Management

The system handles cookies automatically:

NID, AEC, __Secure-BUCKET — Auto-fetched by visiting Google pages in sequence
SOCS — Consent cookie provided in defaults, rarely needs updating
Cookies are cached for 1 hour and refreshed automatically
You can also provide cookies manually via the GMAPS_COOKIES environment variable or the cookies constructor argument

Output Format

Both JSON and CSV files are generated by default in the output/ directory.

JSON Structure

{
  "metadata": {
    "area": "New York, USA",
    "category": "lawyers",
    "boundary": { "name": "New York", "north": 40.91, "south": 40.49, "east": -73.70, "west": -74.25 },
    "search_mode": "grid",
    "enrichment": { "details_fetched": true, "reviews_fetched": true, "reviews_limit": 20 }
  },
  "statistics": {
    "total_collected": 1234,
    "duplicates_removed": 89,
    "filtered_outside_boundary": 56,
    "search_time_seconds": 120.5,
    "total_time_seconds": 340.2
  },
  "businesses": [
    {
      "name": "Smith & Associates Law Firm",
      "address": "123 Broadway, New York, NY 10006",
      "place_id": "ChIJ...",
      "hex_id": "0x89c259a8669c0f0d:0x25d4109319b4f5a0",
      "ftid": "/g/11b5wlq0vc",
      "rating": 4.5,
      "review_count": 123,
      "latitude": 40.7128,
      "longitude": -74.0060,
      "phone": "+1 212-555-0123",
      "website": "https://example.com",
      "category": "Lawyer",
      "categories": ["Lawyer", "Legal Services"],
      "found_in": "Manhattan, New York",
      "hours": {
        "monday": "9:00 AM - 5:00 PM",
        "tuesday": "9:00 AM - 5:00 PM"
      },
      "reviews_data": [
        {
          "review_id": "...",
          "author": "John Smith",
          "author_photo": "https://...",
          "rating": 5,
          "text": "Excellent service!",
          "date": "2 months ago"
        }
      ]
    }
  ]
}

CSV Columns

name, address, place_id, hex_id, ftid, rating, review_count, latitude, longitude, phone, website, category, categories, hours, found_in, reviews_data

API Endpoints

The FastAPI server exposes these endpoints on http://localhost:8000:

Endpoint	Method	Description
`/api/health`	GET	Health check
`/api/decode`	POST	Decode a curl command into structured parameters
`/api/execute`	POST	Execute a search query, return businesses
`/api/place-details`	POST	Fetch place details (hours, phone, photos)
`/api/reviews`	POST	Fetch paginated reviews for a place

Note: When using the library API, the server is started automatically in the background. You only need to start it manually for CLI usage or direct API access.

How It Works

1. Input: area name + category
       |
2. Nominatim API --> get geographic boundaries
       |
3. Generate grid cells covering the entire area
   (or subdivide into named sub-areas, then grid each one)
       |
4. Parallel search: query each cell via Google's internal search endpoint
   - Paginate through all results per cell (400 per page)
   - Adaptive rate limiting with exponential backoff
       |
5. Deduplicate by place_id + hex_id across overlapping cells
       |
6. Filter: remove results outside the target boundary
       |
7. [Optional] Parallel enrichment:
   - Place details (hours, phone, website, photos)
   - Reviews with pagination (via listugcposts endpoint)
       |
8. Export to JSON + CSV (with JSONL streaming in V2)

Google Maps PB Parameter Format

The tool constructs requests using Google's internal pb (protobuf) URL parameter format:

Pattern	Type	Example Use
`!1s`	string	Search query
`!2d` / `!3d`	double	Longitude / Latitude
`!7i`	integer	Results per page
`!8i`	integer	Pagination offset
`!74i`	integer	Max search radius (meters)
`!Nm`	message	N nested fields follow

Architecture

gmaps_extractor/
├── __init__.py              # Package entry, exports GMapsExtractor + collect functions
├── extractor.py             # GMapsExtractor class and CollectionResult wrapper
├── config_manager.py        # ExtractorConfig dataclass, bridges to config.py
├── exceptions.py            # Custom exception hierarchy (GMapsExtractorError, etc.)
├── _config_defaults.py      # Safe fallback config for pip-only installs (no config.py)
├── cli.py                   # CLI argument parsing (V1)
├── cli_v2.py                # CLI argument parsing (V2)
├── cli_enrich.py            # CLI for reviews-only enrichment
├── config.py                # Proxy, cookies, rate limits, search parameters (gitignored)
├── config.example.py        # Template config with placeholders
├── server.py                # FastAPI server (all Google communication goes through here)
├── decoder/
│   ├── pb.py                # Decodes Google's !field_type_value protobuf format
│   ├── curl.py              # Parses curl commands into structured data
│   └── request.py           # Combined request decoder
├── parsers/
│   ├── business.py          # Extracts businesses from search response arrays
│   ├── place.py             # Extracts place details (hours, phone, etc.)
│   └── reviews.py           # Extracts reviews from place responses
├── geo/
│   ├── grid.py              # Grid cell generation and boundary math
│   └── nominatim.py         # OpenStreetMap Nominatim API for boundaries + sub-areas
└── extraction/
    ├── search.py            # Builds and executes search queries
    ├── enrichment.py        # Fetches details + reviews per business
    ├── collector.py          # V1 orchestrator (parallel grid search)
    └── collector_v2.py       # V2 orchestrator (resumable, adaptive, parallel enrichment)

collect.py                   # CLI entry point (V1) — still works standalone
collect_v2.py                # CLI entry point (V2) — still works standalone
enrich_reviews_only.py       # Standalone tool to add reviews to existing collections
run_server.py                # Starts the FastAPI server — still works standalone
pyproject.toml               # Package metadata, dependencies, console script entry points

License

MIT License - See LICENSE for details.

Acknowledgments

Built with:

FastAPI — API server
httpx — HTTP client
OpenStreetMap Nominatim — Geocoding and boundary detection

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.0.0

Feb 9, 2026

This version

1.0.0

Feb 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gmaps_extractor-1.0.0.tar.gz (72.9 kB view details)

Uploaded Feb 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gmaps_extractor-1.0.0-py3-none-any.whl (74.5 kB view details)

Uploaded Feb 6, 2026 Python 3

File details

Details for the file gmaps_extractor-1.0.0.tar.gz.

File metadata

Download URL: gmaps_extractor-1.0.0.tar.gz
Upload date: Feb 6, 2026
Size: 72.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for gmaps_extractor-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`af66927e9524942fd814dbab752cb507c4fcf98e6e26bcdf5cbe2992f423636c`
MD5	`60f4de4986c4cfc653fcdaa6eff745d6`
BLAKE2b-256	`2eda00f26f5f12cc9f1eb6f82533ba71517547ddac7b8a08e838832bb06b4e0b`

See more details on using hashes here.

File details

Details for the file gmaps_extractor-1.0.0-py3-none-any.whl.

File metadata

Download URL: gmaps_extractor-1.0.0-py3-none-any.whl
Upload date: Feb 6, 2026
Size: 74.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for gmaps_extractor-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f4f5714c7c36ebb0443a7047988e82240ffa07347fa1344f99983d10182a0566`
MD5	`3a7f05231ea89c02d7e097b6e1adb622`
BLAKE2b-256	`313e9048415da4281ccf7cfd2d01812443ef26d82402fd6cc70f67cb87f425fd`

See more details on using hashes here.

gmaps-extractor 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Google Maps Business Extractor

Features

Installation

From PyPI

From Source

Requirements

Quick Start

Python Library (Recommended)

Command Line

Python Library API

GMapsExtractor

Constructor Parameters

collect() — V1 Collector

collect_v2() — Enhanced Collector (Recommended for Large Jobs)

CollectionResult

Exception Handling

Low-Level Functions

Important Notes

Console Scripts

CLI Reference

collect.py / gmaps-collect

collect_v2.py / gmaps-collect-v2 (Enhanced)

CLI Quick Examples

Configuration

Option 1: Constructor Arguments (Library Only)

Option 2: Environment Variables (Recommended for CLI)

Option 3: Config File

Proxy Requirements

Cookie Management

Output Format

JSON Structure

CSV Columns

API Endpoints

How It Works

Google Maps PB Parameter Format

Architecture

License

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes