Skip to main content

Scraping flight data from Google Flights and analyzing.

Project description

kcelebi License: MIT Live on PyPI

Flight Analysis

This project provides tools and models for users to analyze, forecast, and collect data regarding flights and prices. There are currently many features in initial stages and in development. The current features (as of 4/5/2023) are:

  • Scraping tools for Google Flights
  • Base analytical tools/methods for price forecasting/summary

The features in development are:

  • Models to demonstrate ML techniques on forecasting
  • API for access to previously collected data

Table of Contents

Overview

Flight price calculation can either use newly scraped data (scrapes upon running it) or cached data that reports a price-change confidence determined by a trained model. Currently, many features of this application are in development.

Usage

The web scraping tool is currently functional only for scraping round trip flights for a given origin, destination, and date range. It can be easily used in a script or a jupyter notebook.

Note that the following packages are absolutely required as dependencies:

  • tqdm
  • selenium (make sure to update your ChromeDriver!)
  • pandas
  • numpy

You can easily install this by running either installing the Python package google-flight-analysis:

pip install google-flight-analysis

or forking/cloning this repository. Upon doing so, make sure to install the dependencies and update ChromeDriver to match your Google Chrome version.

pip install -r requirements.txt

The main scraping function that makes up the backbone of most other functionalities is Scrape(). It serves also as a data object, preserving the flight information as well as meta-data from your query. For Python package users, import as follows:

from google_flight_analysis.scrape import *

For GitHub repository cloners, import as follows from the root of the repository:

from src.google_flight_analysis.scrape import *
#---OR---#
import sys
sys.path.append('src/google_flight_analysis')
from scrape import *

Here is some quick starter code to accomplish the basic tasks. Find more in the documentation.

# Try to keep the dates in format YYYY-mm-dd
result = Scrape('JFK', 'IST', '2023-07-20', '2023-08-10') # obtain our scrape object
dataframe = result.data # outputs a Pandas DF with flight prices/info
origin = result.origin # 'JFK'
dest = result.dest # 'IST'
date_leave = result.date_leave # '2023-05-20'
date_return = result.date_return # '2023-06-10'

You can also scrape for one-way trips now:

results = Scrape('JFK', 'IST', '2023-08-20')
result.data.head() #see data

Updates & New Features

Performing a complete revamp of this package, including new addition to PyPI. Documentation is being updated frequently, contact for any questions.

Real Usage

Here are some great flights I was able to find and actually booked when planning my travel/vacations:

  • NYC ➡️ AMS (May 9), AMS ➡️ IST (May 12), IST ➡️ NYC (May 23) | Trip Total: $611 as of March 7, 2022

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

google-flight-analysis-1.1.0.tar.gz (16.1 kB view details)

Uploaded Source

Built Distribution

google_flight_analysis-1.1.0-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file google-flight-analysis-1.1.0.tar.gz.

File metadata

File hashes

Hashes for google-flight-analysis-1.1.0.tar.gz
Algorithm Hash digest
SHA256 d0da142f2f292948e63978a4a45885c0add6d7ad03e50399e323cc5855fce07f
MD5 46057400d434b373f578ab974c9eb878
BLAKE2b-256 dda71454d28d6fa7021602d0274575d512deae424fa88a108a9c361e03181ad7

See more details on using hashes here.

File details

Details for the file google_flight_analysis-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for google_flight_analysis-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f707cf27e3751307cd2928c3200385d6caf71a7be87eb9c570059daff0127a16
MD5 14f1b10296b013c255151405cec5246a
BLAKE2b-256 26dbb438fe845a09b3083f6a45bb8f91e74c0915467b21d7a9d8165f9fe9e49d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page