Skip to main content

Python client for scraping Google Flights using the ScrapingBee web scraping API

Project description

google-flights-scraper-api

A Python client for the Google Flights scraper API powered by ScrapingBee. It turns a public Google Flights page into clean data you can load into pandas, a database, or a price-monitoring job, without you running a single headless browser or proxy.

Google does not ship a public Google Flights API, and the page builds its fares with JavaScript behind anti-bot protection. This package sends the work to ScrapingBee, which renders the page, rotates residential proxies, and hands back rendered HTML or structured JSON.

Built on the ScrapingBee web scraping API

If you searched for any of these, you are in the right place:

  • google flights api
  • google flights scraper
  • google flights scraper api

Why a Google Flights scraper API instead of plain requests

A direct requests.get() against Google Flights returns an empty shell. The fares, durations, and stop counts are injected by JavaScript after load, and Google quickly blocks datacenter IPs with consent walls and challenges.

A managed google flights api layer removes that whole class of problems:

  • Executes the page JavaScript in a real headless browser
  • Rotates residential proxies so requests are not blocked
  • Skips the Google consent interstitial
  • Returns structured JSON when you supply extraction rules

You write the query and read the data. The infrastructure is someone else's problem.

Installation

pip install google-flights-scraper-api

Requires Python 3.8+ and requests.

Quick start

from google_flights_scraper_api import GoogleFlightsScraper

scraper = GoogleFlightsScraper(api_key="YOUR_API_KEY")

html = scraper.search(query="Flights to London from New York")
print(html[:500])

Grab a free key first. ScrapingBee gives 1,000 credits with no card required at scrapingbee.com.

How it works

Every call hits the ScrapingBee HTML API:

https://app.scrapingbee.com/api/v1/

The client builds the request with documented parameters: the Google Flights url, render_js=true, premium_proxy=true, and the Google CONSENT cookie so the consent page is skipped. You never assemble the query string yourself.

Structured data with AI extraction

Rather than parse Google's rotating markup, pass ai_extract_rules and get JSON back. The schema you define becomes the response shape.

from google_flights_scraper_api import GoogleFlightsScraper

scraper = GoogleFlightsScraper(api_key="YOUR_API_KEY")

data = scraper.search(
    query="Flights to Tokyo from San Francisco",
    ai_extract_rules={
        "flights": {
            "description": "every flight result on the page",
            "type": "list",
            "output": {
                "airline": "name of the airline",
                "price": "ticket price in dollars",
                "departure_time": "departure time",
                "arrival_time": "arrival time",
                "duration": "total trip duration",
                "stops": "number of stops",
            },
        },
    },
)

for flight in data.get("flights", []):
    print(flight["airline"], flight["price"], flight["stops"])

The description, type, and output keys follow ScrapingBee's documented extraction schema. type accepts string, list, number, boolean, and item.

Waiting for fares to load

Google Flights sometimes streams results in after first paint. Use a js_scenario to wait or scroll before the page is captured. A scenario runs up to 40 seconds.

html = scraper.search(
    query="Flights to Rome from Boston",
    js_scenario={
        "instructions": [
            {"wait": 3000},
            {"scroll_y": 1000},
            {"wait": 1000},
        ],
    },
)

Configuration options

Argument API parameter Description
query url (?q=) Natural-language flight search appended to the Google Flights URL
url url A full Google Flights URL, used instead of query
render_js render_js Execute page JavaScript (default True)
premium_proxy premium_proxy Residential proxies (default True)
stealth_proxy stealth_proxy Stealth tier for the hardest blocks
country_code country_code ISO country code, needs premium_proxy=True
ai_extract_rules ai_extract_rules Natural-language extraction, returns JSON, adds 5 credits
extract_rules extract_rules CSS or XPath extraction rules
js_scenario js_scenario Script waits, scrolls, and clicks before capture
wait wait Fixed wait in milliseconds
screenshot_full_page screenshot_full_page Return a full-page screenshot as bytes
json_response json_response Wrap the response in a JSON envelope

What you get back

  • Default: the rendered HTML of the Google Flights page as a string.
  • With ai_extract_rules or extract_rules: parsed JSON matching the schema you defined.
  • With screenshot_full_page=True: raw PNG bytes.

Production use cases

This google flights scraper fits cleanly into:

  • Fare-tracking jobs that alert when a route drops below a threshold
  • Competitive pricing dashboards for travel agencies and OTAs
  • Route and demand research across markets
  • Data pipelines feeding a warehouse or a notebook for analysis

Pricing

ScrapingBee bills successful requests. A request that fails with HTTP 500 is not charged. Scraping a Google URL through the HTML API is a flat rate, and toggling JS does not change it:

  • Classic or Premium proxy: 20 credits per request
  • Stealth proxy: 75 credits per request
  • ai_extract_rules: adds 5 credits

Current rate card: scrapingbee.com/pricing.

FAQ

Is there an official Google Flights API? No. Google does not offer a public Google Flights API for fares, so a scraper API that renders the public page is the practical route. This package wraps that approach.

Why not parse the HTML myself? You can, but Google Flights uses obfuscated, rotating class names. Defining ai_extract_rules is more durable than maintaining selectors that break every few weeks.

Can I target a specific country or currency view? Yes. Set country_code together with premium_proxy=True. The country code has no effect without a premium proxy.

Does it handle the Google consent page? Yes. The client sends the Google CONSENT cookie by default. Disable it with skip_consent=False.

Documentation

License

MIT

Disclaimer

This is an unofficial Python client built on top of the ScrapingBee web scraping API. It is not affiliated with ScrapingBee or Google. Scrape only public pages, and comply with Google's terms of service and applicable data-protection law.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

google_flights_scraper_api-0.0.1.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

google_flights_scraper_api-0.0.1-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file google_flights_scraper_api-0.0.1.tar.gz.

File metadata

File hashes

Hashes for google_flights_scraper_api-0.0.1.tar.gz
Algorithm Hash digest
SHA256 e9b741cf6ba28d1d8cedbcd03887dc62d33c475bdebb436709e8411a89617b4e
MD5 b2710561e115c39838ac2685c0393575
BLAKE2b-256 0cf89ed9c96cf088afd1bfbcfbd1e2e930a5136b4f30c1c68b52a3cf12a7d2fc

See more details on using hashes here.

File details

Details for the file google_flights_scraper_api-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for google_flights_scraper_api-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 92e744182cbc12caf6789aa5a8944dfe3ba66830c962ea5e47726338cab3e15f
MD5 f2a8b67916490bd3b9f31dbfaa3d6df7
BLAKE2b-256 c83889b14ca05a4a5072d0dbbadfacaa26c425ef4250ebaa5dcb20ed2c0619b3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page