Skip to main content

Scraping flight data from Google Flights and analyzing.

Project description

kcelebi License: MIT

Flight Analysis

This project provides tools and models for users to analyze, forecast, and collect data regarding flights and prices. There are currently many features in initial stages and in development. The current features (as of 8/29/22) are:

  • Scraping tools for Google Flights
  • Base analytical tools/methods for price forecasting/summary
  • Models to demonstrate ML techniques on forecasting
  • API for access to previously collected data

Table of Contents

Overview

Flight price calculation can either use newly scraped data (scrapes upon running it) or cached data that reports a price-change confidence determined by a trained model. Currently, many features of this application are in development. You can find updates and use some of the functionalities online here.

Usage

The web scraping tool is currently functional only for scraping round trip flights for a given origin, destination, and date range. It can be easily used in a script or a jupyter notebook.

Note that the following packages are absolutely required as dependencies:

  • tqdm
  • selenium (make sure to update your chromedriver!)
  • json

You can easily install this by running pip install -r requirements.txt.

The main scraping function that makes up the backbone of most other functionalities is scrape_data. Note that the cache parameter refers to whether this output should be saved in a caching system. See further documentation on caching (to be available soon).

# Parameter documentation
# scrape_data(origin : str, destination : str, date_leave : str, date_return : str, cache : bool = False) -> dict
# Try to keep the dates in format YYYY-mm-dd

result = scrape_data('JFK', 'IST', '2022-05-20', '2022-06-10')

# Can also input list of date strings for date_leave and date_return

leave_dates = ['2022-05-20', '2022-05-21', '2022-05-22']
return_dates = ['2022-06-10', '2022-06-11', '2022-06-12']
range_result = scrape_data('JFK', 'IST', leave_dates, return_dates)

Updates & New Features

Real Usage

Here are some great flights I was able to find and actually booked when planning my travel/vacations:

  • NYC ➡️ AMS (May 9), AMS ➡️ IST (May 12), IST ➡️ NYC (May 23) | Trip Total: $611 as of March 7, 2022

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

google-flight-analysis-0.0.4.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

google_flight_analysis-0.0.4-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file google-flight-analysis-0.0.4.tar.gz.

File metadata

File hashes

Hashes for google-flight-analysis-0.0.4.tar.gz
Algorithm Hash digest
SHA256 fd82237fbc514c76dfac848d3157b20ecf9c6fc7162fd061dcd0f9f8f349c18f
MD5 f1ec503bc1dcccdaddadc5c00f1e7d5a
BLAKE2b-256 0cf64617f4cd4fcd9a8444f56cc20f1206d110f30263e36ddba0790ef8dbbb2d

See more details on using hashes here.

File details

Details for the file google_flight_analysis-0.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for google_flight_analysis-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 17808861c8464097d42880c1ee74493d45962e5d76d4e79691b3148bbc693788
MD5 af60335098ee35e4e4da4aec511c3bed
BLAKE2b-256 4b7d7bacb91cbf6d1c9160ea62aa58720297c1d2d015f97f229eec8857ca6673

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page