Scraping flight data from Google Flights and analyzing.
Project description
Flight Analysis
This project provides tools and models for users to analyze, forecast, and collect data regarding flights and prices. There are currently many features in initial stages and in development. The current features (as of 8/29/22) are:
- Scraping tools for Google Flights
- Base analytical tools/methods for price forecasting/summary
- Models to demonstrate ML techniques on forecasting
- API for access to previously collected data
Table of Contents
Overview
Flight price calculation can either use newly scraped data (scrapes upon running it) or cached data that reports a price-change confidence determined by a trained model. Currently, many features of this application are in development. You can find updates and use some of the functionalities online here.
Usage
The web scraping tool is currently functional only for scraping round trip flights for a given origin, destination, and date range. It can be easily used in a script or a jupyter notebook.
Note that the following packages are absolutely required as dependencies:
- tqdm
- selenium (make sure to update your chromedriver!)
- json
You can easily install this by running pip install -r requirements.txt
.
The main scraping function that makes up the backbone of most other functionalities is scrape_data
. Note that the cache
parameter refers to whether this output should be saved in a caching system. See further documentation on caching (to be available soon).
# Parameter documentation
# scrape_data(origin : str, destination : str, date_leave : str, date_return : str, cache : bool = False) -> dict
# Try to keep the dates in format YYYY-mm-dd
result = scrape_data('JFK', 'IST', '2022-05-20', '2022-06-10')
# Can also input list of date strings for date_leave and date_return
leave_dates = ['2022-05-20', '2022-05-21', '2022-05-22']
return_dates = ['2022-06-10', '2022-06-11', '2022-06-12']
range_result = scrape_data('JFK', 'IST', leave_dates, return_dates)
Updates & New Features
Real Usage
Here are some great flights I was able to find and actually booked when planning my travel/vacations:
- NYC ➡️ AMS (May 9), AMS ➡️ IST (May 12), IST ➡️ NYC (May 23) | Trip Total: $611 as of March 7, 2022
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file google-flight-analysis-0.0.4.tar.gz
.
File metadata
- Download URL: google-flight-analysis-0.0.4.tar.gz
- Upload date:
- Size: 13.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fd82237fbc514c76dfac848d3157b20ecf9c6fc7162fd061dcd0f9f8f349c18f |
|
MD5 | f1ec503bc1dcccdaddadc5c00f1e7d5a |
|
BLAKE2b-256 | 0cf64617f4cd4fcd9a8444f56cc20f1206d110f30263e36ddba0790ef8dbbb2d |
File details
Details for the file google_flight_analysis-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: google_flight_analysis-0.0.4-py3-none-any.whl
- Upload date:
- Size: 7.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 17808861c8464097d42880c1ee74493d45962e5d76d4e79691b3148bbc693788 |
|
MD5 | af60335098ee35e4e4da4aec511c3bed |
|
BLAKE2b-256 | 4b7d7bacb91cbf6d1c9160ea62aa58720297c1d2d015f97f229eec8857ca6673 |